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(54) Method and apparatus for special video reproduction modes 



(57) A special reproduction control information com- 
prises plurality of items (100) of frame information. Each 
of the items of frame information comprises video loca- 



tion information (1 01 ) indicating the location of video da- 
ta to be reproduced in a special reproduction and display 
time control information (102) indicating the time for dis- 
playing the video data. 
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Description 

[0001] The present invention relates to a special re- 
production control information describing method for de- 
scribing special reproduction control information used 
to perform special reproduction for target video con- 
tents, a special reproduction control information creat- 
ing method for creating the special reproduction control 
Information and a special reproduction control informa- 
tion creating apparatus and a video reproduction appa- 
ratus and method for performing special reproduction 
by using the special reproduction control information. 
[0002] In recent years, a motion picture is com- 
pressed as a digital video and is stored in disk media 
represented by a DVD, and a HDD so that a video can 
be reproduced at random. A video can be reproduced 
halfway from a desired liming in the state of virtually no 
waiting time. As In conventional tape media, disk media 
can be fast reproduced al two to Tour limes speed or can 
be reversely reproduced. 

[0003] However, there is a problem in that the length 
of a video can be very long In many cases, and time 
cannot be sufficiently compressed to view the whole 
contents of tho video even .at two to four times fast re- 
production. When the rate of the fast reproduction is in- 
creased, the scene change is enlarged to a degree ex- 
ceeding the ability to view it, so that grasping the con- 
tents is difficult, and even portions which are not needed 
are also reproduced so that waste is caused. 
[0004] Accordingly, the present invention is directed 
to method and apparatus that substantially obviatesone 
or more of the problems due to limitations and disad- 
vantages of the related, art. 

[0005] According to one aspect of the present inven- 
tion, a method of describing frame information compris- 
es: 
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comprises: 

a unit configured to extract a frame from a plurality 
of frames in a source video data; 
a unit configured to create the frame information in- 
cluding first infoimation specifying a location of the 
extracted frame and second information relating to 
a display time of the extracted frame; and 
a unit configured to link the extracted frame to the 
frame information. 

[0008] According to another aspect of the present in- 
vention, a method of creating frame information com- 
prises: 

extracting a frame from a pluralily of frames in a 
source video data; and 

creating the frame Information Including first infor- 
mation specifying a location or the extracted frame 
in the source video data and second infor mation re- 
lating to a display time of the extracted frame. 



describing, for a frame extracted from a plurality of 
frames in a source video data, first information 
specifying a location of the extracted frame in the 40 
source video data; and 

describing, forthe extracted frame, second informa- 
tion relating to a display time of the extracted frame. 

[0006] According to another aspect of the present in- 45 
vention, an article of manufacture comprising a compu- 
ter usable medium storing frame information, the frame 
infoimation comprises: 



[0009] According to another aspect of the present in- 
vention, an apparatus for performing a special reproduc- 
es tion comprises: 



. a unit configured to refer to frame information de- 
scribed for a frame extracted from a plurality of 
frames in a source video data and including first in- 
formation specifying a location of the extracted 
frame in the source video data and second informa- 
tion relating to a display time of the extracted frame; 
a unit configured to obtain the video data corre- 
sponding to the extracted frame based on the first 
information; 

a unit configured to determine the display time of 
^hirextrattetlira-m^ inf orrna— 
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first information, described for a frame extracted so 
from a plurality of frames, specifying a location of 
£h^^trac^edtomeJn4he^ourcej/ideo-data^and- 



secondlnformatTon, described for ~th~e~ exiractelT 
frame, relating to a display time of the extracted 
frame. 

[0007] According to another aspect of the present in- 
vention, an apparatus for creating frame information 



tion; and 

a unit configured to display the obtained video data 
for the determined display time. 

[0010] According to another aspect of the present in- 
vention, an article of manufacture comprising a method 
of performing a special reproduction comprises: 

referring to frame information described for a frame 
extracted from a plurality offrames in a source video 
data and including first infonnation specifying a lo- 
cation of the extracted frame and second informa- 
tion relating to a display time of the extracted frame; 
obtaining the video data corresponding to the ex- 
-4racted4rame-based-o«4he-fir^tJnfo rmatlonr 
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oTeterrriining th~e~aisp]ay tirrieof "the exfracted frame 
based on the second information; and 
displaying the obtained- video data for the deter- 
mined display time. 

[0011] According to another aspect of the present in- 
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vention, an article of manufacture comprising an article 
of manufacture comprising a computer usable medium 
having computer readable program code means em- 
bodied therein, the computer readable program code 
means performing a special reproduction, the computer 
readable program code means comprises: 

computer readable program code means for caus- 
ing a computer to refer to frame information de- 
scribed for a frame extracted from a plurality of 
frames in a source video data and including first in- 
formation specifying a location of the extracted 
frame and second Information relating to a display 
time of the extracted frame; 
computer readable program code means for caus- 
ing a computerto obtain the video data correspond- 
ing to the extracted frame based on the first infor- 
mation; 

compuler readable program code means for caus- 
ing a computer to determine the display time of the 
extracted frame based on the second information; 
and 

computer readable program code means for caus- 
ing a computer to display the obtained video data 
for the determined display time, 

[0012] According to another aspect of the present in- 
vention, an article of manufacture comprising a method 
of describing sound information, the method comprises: 

describing, for a frame extracted from a plurality of 
sound frames in a source sound data, first informa- 
tion specifying a location of the extracted frame in 
the source sound data; and 
describing, f orthe extracted frame, second inf orma- 
tipnj;eiatingJ;o_axepr^ 



describing, for a frame extracted from a plurality of 
text frames in a source text data, first information 
specifying a location of the extracted frame in the 
source text data; and 
5 describing, forthe extractedframe, second informa- 
tion relating to a display start time and display time 
of the text data of the extracted frame. 

[0.015] According to another aspect of the present in- 
to vention, an article of manufacture comprising an article 
of manufacture comprising a computer usable medium 
storing frame information, the frame information com- 
prises: 
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duction time of the sound data of the extracted 
frame. 

[001 3] According to another aspect of the present in- 
vention, an article of manufacture comprising an article 
of manufacture comprising a computer usable medium 
storing frame information, the frame information com- 
prises: 

first information, described for a frame extracted 
from a plurality of sound frames, specifying a loca- 
tion of the extracted frame in the source sound data; 
and 

second information, described for the extracted 
frame, relating to a reproduction sta rt time and re- 
— - — production time -of- tho-sound -data -of-the extractobT 
frame. 

[0014] According to another aspect of the present in- 
vention, an article of manufacture comprising a method 
of describing text information, the method comprises: 
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first information, described for a frame extracted 
from a plurality of text frames in a source text data, 
specifying a location of the extracted frame in the 
source text data; and 

second information, described for the extracted 
frame, relating to a display start time and display 
time of the text data of the extracted frame. 

[0016] This summary of the invention does not nec- 
essarily describe all necessary features so that the in- 
vention may also be a sub-combination of these-de- 
scribed features. 

[0017] The present invention can be implemented ei- 
ther in hardware or on software in a general purpose 
computer. Further the present invention can be imple- 
mented In a combination of hardware and software. The 
present invention can also be implemented by a single 
processing apparatus or a distributed network of 
processing apparatuses. 

[001 B] Since the present invention can be implement- 
ed by software, the present invention encompasses 
computer code provided to a g enera) p ur pos e computer 
on any suitable carrier medium. The carrier medium can 
comprise any storage medium such as a floppy disk, a 
CD ROM, a magnetic device or a programmable mem- 
ory device, or any transient medium such as any signal 
e.g. an electrical, optical or microwave signal. 
[0019] The invention can be more fully understood 
from the following detailed description when taken in 
conjunction with the accompanying drawings, in which: 
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FIG. 1 is a view showing an example of a data struc- 
ture of special reproduction control information ac- 
cording to one embodiment of the present inven- 
tion; 

FIG. 2 is a view showing an example of a structure 
of a special reproduction control information creat- 



ing apparaTus7.r__l 

FIG. 3 is a view showing another example of struc- 

ture-of-the-special-reproduetion-eontrol-information- 

55 creating apparatus; 

FIG. 4 is a flowchart showing one example for the 

apparatus. shown in FIG. 2; 

FIG. 5 is a flowchart showing one example for the 
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apparatus shown in FIG. 3; 
FIG. 6 is a view showing an example of a structure 
o/ a video reproduction apparatus; 
FIG. 7 is a flowchart showing one example for the 
apparatus shown in FIG. 6; 5 
FIG. 8 is a view showing an example of a data struc- 
ture of special reproduction control information; 
FIG. 9 is a view explaining video location informa- 
tion for referring to an original video frame; 
FIG, 1 0 is a view explaining video location informa- 10 
tion for referring to a image data file; 
FIG. 1 1 is a view explaining a method for extracting 
video data in accordance with a motion of a screen: 
FIG, 12 is a view explaining video location informa- 
tion for refening to the original video frame; 15 
FIG. 13 is a view for explaining video location infor- 
mation for referring to the image data file; 
FIG. 14 Is a view showing an example of a data 
slruciure of special reproduction control information 
in which plural original video frames are refen"ed to; 20 
FIG. 15 is a view explaining a relation between the 
video location information and the original plural 
video frames; 

FIG. 1 6 is a view explaining a relation between the 
image data file and the original plural video frames; 25 
FIG. 17 is a view explaining video location informa- 
tion for referring to the original video frame; 
FIG. 18 is a view for explaining video location infor- 
mation for referring to the image data file; 
FIG. 1 9 is a flow chart for explaining a special re- 30 
production;- 

FIG. 20 is a viewfor explaining a method for extract- 
ing video data in accordance with a motion of a 
screen; 

FIG.21 is aviewforexplainingamethodforextract- 35 
ing video data in accordance with a motion of a 



screen; 

FIG. 22 is a flowchart showing one example for cal- 
culating display time atwhich ascenechange quan- 
tity becomes constant as much as possible; 40 
FIG. 23 is a flowchart showing orie example for cal- 
culating a scenechahge quantity of the whole frame 
from an MPEG video; 

FIG. 24 is a view for explaining a method for calcu- 
lating a scene change quantity of a video from an 45 
MPEG stream; 

FIG. 25 is a view for explaining a processing proce- 
dure for calculating display time at which a scene 
change quantity becomes constant as much as pos- 
sible; 50 
FIG. 26 is a flowchart showing one example of the 



duction on the basis of special reproduction control 
information; 

FIG. 27 is a flowchart showing one example for con- 
ducting special reproduction on the basis of a dis- 
play cycle; 

FIG. 28 is a view for explaining a relationship be- 



tween a calculated display time and the display cy- 
cle; 

FIG. 29 is a view for explaining a relationship be- 
tween a calculated display time and the display cy- 
cle; 

FIG . 30 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG. 31 is a view explaining a method for extracting 
video data in accordance with a motion of a screen; 
FIG. 32 is a view explaining video location informa- 
tion for referring to the original video frame; 
FIG, 33 is a view showing anotherexample or a data 
structure of special reproduction control informa- 
tion; 

FIG. 34 is a viewshowing anotherexample of a dala 
structure of special reproduction control informa- 
tion; 

FIG . 35 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FiG. 36 is a flowchart showing one example for cal- 
culating display time from the importance; 
FIG. 37 is a view for explaining a method for calcu- 
lating display time from the importance; 
FIG. 38 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that a scene having a large sound level is important; 
FIG. 39 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that a scene on which many important words appear 
with sound recognition is important, ora processing 
procedure for calculating importance data on the 
basis of the idea that the scene in which the number 
of words talked per time is many is important; 
FIG. 40 is a flowchart showing one exam ple forcal- 
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culating Importance data on the basis of the idea 
that a scene on which many important words appear 
with telop recognition is important, ora processing 
procedure for calculating importance data on the 
basis of the idea that the scene in which the number 
of words included in the telop which appears per 
time is large with telop recognition is important; 
FIG. 41 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that the scene In which a large character appears 
as a telop is important; 

FIG. 42 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that the scene in which many human faces appear 
is important ora processing for calculating impor- 

meydata^hT^ 
where human faces are displayed in an enlarged 
manner is important; 

FIG. 43 is aflowchartshowing one example forcai- 
culating importance data on the basis of the idea 
that the scene in which videos similar to the regis- 
tered important scene appear is important; 
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FIG. 44 Isaviewshowinganotherexampleof a data 
structure of special reproduction control informa- 
tion; 

FIG. 45 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG. 46 is a viewshowing another example of a data 
structure of special reproduction control Informa- 
tion; 

FIG. 47 is a view for explaining a relationship be- 
tween information as to whether the scene is to be 
reproduced or not and the reproduced video; 
FIG. 48 is a flowchart showing one example of a 
processing procedure of special reproduction in- 
cluding reproduction and non-reproduction judg- 
ment; 

FIG. 49 is a view showing one example of a data 
structure when sound information or text informa- 
tion is added; 

FIG. 50 is a view showing one example of a data 
structure for describing only sound Information sep- 
arately from frame information; 
FIG, 51 Is a view showing one example of a data 
structure for describing only text information sepa- 
rately from frame information; 
FIG. 52 is a view for explaining a synchronization 
of a reproduction of each of media; 
FIG. 53 is a flowchart showing one example of a 
determination procedure of a sound reproduction 
start time and a sound reproduction time in a video 
frame section; 

FIG. 54 is a flowchart showing one example for pre- 
paring reproduction sound data and correcting vid- 
eo frame display time; 

FIG. 55 is a flowchart showing one example of a 

processing procedure of obt ainin g text informatio n 

with telop recognition; 

FIG. 56 is a flowchart showing one example of a 
processing procedure of obtaining text information 
with sound recognition; 

FIG. 57 is a flowchart showing one example of a 
processing procedure of preparing text information; 
FIGS. 5BA and 58B are views for explaining a meth- 
od of displaying text information; 
FIG. 59 is a view showing one example of a data 
structure of special reproduction control Information 
for sound information; 

FIG. 60 is a view showing another example of a data 
structure of special reproduction control information 
for sound information; 

FIG. 61 is a view explaining a summary reproduc- 

IT tion oUTie^ound/musTc-cfafaT^ncI Z 

FIG. 62 is a view explaining another summary re- 

..... . ....... — production; pf1h'e;^(Junpymusicrdiata;-"~"- "t; ; ~~; 

[0020] Preferred embodiments of the present inven- 
tion v/ill now be described with reference to the accom- 
panying drawings. 



[0021] The embodiments relate to a reproduction of 
video contents having video data using special repro- 
duction control Information. The video data comprises 
■ a sel of video frames (video frame group) constituting a 
5 motion picture. 

[0022] The special reproduction control information is 
created from the video data by a special reproduction 
control information creating apparatus and attached to 
the video data. The special reproduction is reproduction 
10 by a method otherthan a normal reproduction. Thespe- 
cial reproduction includes a double speed reproduction 
(or a high speed reproduction), jump reproduction (or 
jump continuous reproduction), and a trick reproduction. 
The trick reproduction includes a substituted reproduc- 
es tion, an overlapped reproduction, a slow reproduction 
and the like. The special reproduction control informa- 
tion is referred to when the special reproduction is exe- 
cuted in the video reproduction apparatus. 
[0023] FIG. 1 shows one example of a basic data 
20 structure of the special reproduction control information. 
[0024] In this data structure, plural items of frame in- 
formation "l M {i= 1 to N) are described in correspondence 
to the frame appearance order in the video data, Each 
frame information 100 Includes a set of video location 
25 information 101 and display time control information 
102. The video location information 101 Indicates a lo- 
cation of video data to be displayed at the time of special 
reproduction. The video data to be display may be one 
frame, a group of a plurality of continuous frames, or a 
so group formed of a part of a plurality of continuous 
frames. The display time control information 1 02 forms 
the basis of calculating the display time of the video da- 
ta. 

[0025] in FIG. 1 , the frame information Y is arranged 
35 in an order of the appearance of frames in the video da- 
ta. Wh en inf ormation indicating an or der of frame inf or 



matTon is described in the frame information Y, the 
frame information "i" may be arranged and described in 
any order. 

40 [0026] The reproduction rate information 103 at- 
tached to a plurality of items of frame information "i" 
shows the reproduction speed rate and is used for des- 
ignating the reproduction at aspeed several times high- 
er than that corresponding to the display time as de- 

45 scribed by the display time control information -10?. 
However, the reproduction rate information 103 is not 
essential Information, The information 103 may con- 
stantly be attached: not constantly be attached, or se-. 
lectively attached. Even when the reproduction rate in- 

so formation 103 is attached, the information may not be 
used at the time of special reproduction. The reproduc- 

ZIZZMafat^ 

constantly used, or is selectively used. 

[00271 InFlGH-it-is-possibleto-further-^ddother-Gon-; — 

55 trol information to the frame information group together 
with the reproduction rate information or in place of the 
reproduction rate information. In FIG. 1 , it is also possi- 
ble to add different control information to each frame in- 
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formation °i". in these cases, each information included 
in the special reproduction control information may be 
all used on the side of the video reproduction device, or 
a part of the information may be used, 
[0028] FIG, 2 shov/s an example of a structure of an 5 
apparatus for creating special- reproduction control in- 
formation. 

[0Q29] This special reproduction control information 
creating device comprises a video data slorage unit 2, 
a video data processing unit 1 including a video location 10 
information processing unji 1 1 and a display time control 
- inlormation processing unit 12, and a special reproduc- 
tion control information storage unit 3, In detail, as will 
be described later, since the video data {encoded data) 
is decoded to be video data before displaying, it takes is 
a processing time required for decoding the video data 
from the display instruction is issued until the video is 
displayed. In order to extracted this processing time, it 
is proposed to decode the video data beforehand and 
store an image data file. 20 
[0030] If an image data file is used (the image data 
file maybe constantly used, or the image data file is se- 
lectively used), an image data file creating unit 13 (in 
the video data processing unit 1 ) and an image data file 
storage unit 14 are further provided as shown in FIG. 3. 25 
If other control information is added which is determined 
on the basis of the video data to the special reproduction 
control information,thecorrespondingfunction Is appro- 
priately added to the inside of the video data processing 
unit 1 . 30 
[0031] If an operation by a user is intervened in this 
processing, a GUI is used for displaying, for example, 
video data in frame units, and providing a function of 
receiving an input of an instruction by the user though 
omitted in FIGS. 2 and 3. 35 
[003 2] In FIGS , 2 and 3, a CPU, a memory, an exter- 
nal storage device, and a network communication de^ 
vice is provided when needed, and software such as 
driver software used when needed and an OS are not 
shown. 4Q 
[0033] The video data storage unit2 stores video data 
which becomes an target of processing for creating spe- 
cial reproduction control information (or special repro- 
duction control information and image data files), 
[0034] The special reproduction control Information 45 
storage unit 3 stores special reproduction control infor- 
mation that has been created. 

[0035] The image datafilestorage unit4stores image 
data files that have been created. 

[0036] The storage units 2, 3, and 4 comprise, for ex- so 
ample, a hard disk, an optical disk and a semiconductor 



separate storage devices. All or part of the storage units 
may comprise the same storage device. 
[0037] The video data processing unit 1 creates the 55 
special reproduction control information (or the special 
reproduction control information and image data file) on 
the basis of the video data which becomes an target of 



processing. 

[0038] The video location information processing unit 
11 determines (extracts) a video frame (group) which 
should be displayed or which can be displayed at the 
time of special reproduction to conduct processing of 
preparing the video location information 101 which 
should be described in each frame information Y, 
[0039] The display time control information process- 
ing unit 1 02 conducts a processing for preparing the dis- 
play time control information 102 associated with the 
display time of the video frame (group) associated with 
each frame information T. 

[0040] The image data file creating unit 13 conducts 
a processing for preparing animage data file from the 
video data. 

[0041] The special reproduction control information 
creating apparatus can be realized, for example, in a 
form of conducting software on a computer. The appa- 
ratus may be realued.as a dedicated apparatus for cre- 
ating the special reproduction control information. 
[0042] FIG. 4 shows an example of a processing pro- 
cedure in a case of a structure of FIG. 2. The video data 
is read (step S1 1 ), video location information 1 01 is cre- 
ated (step S12), display time control information 102 is 
created (step S13), and special reproduction control in- 
formation is stored (step S14). The procedure of FIG. 4 
may be consecutively conducted for each frame infor- 
mation, and each processing may be conducted in 
batches. The other procedures can also be conducted. 
[0043] FIG. 5 shovys an example of a processing pro- 
cedure in a case of the structure of FIG. 3. A procedure 
for preparing and storing image data files is added to a 
procedure of FIG. 4 (step S22), The image data file is 
created and/or stored together with the preparation of 
the video location information 101 . It is also possible to 
cr eate the video location info rm ation 1 01 at a timin g dif- 
ferent from that of FIG. 4. In the same manner as the 
case of FIG, 4, the procedure of FIG. 5 may be conduct- 
' ed for each frame information, or may be conducted in 
batches. The other procedures can also be conducted. 
[00441 FIG. 6 shows an example of a video reproduc- 
tion apparatus. 

[0045] This video reproduction apparatus comprises 
a controller 21, a normal reproduction processing unit 
22, a special reproduction processing unit 23, a display 
device 24, and a contents storage unit 25. If contents 
are handled wherein audio such as sound or the like is 
added to the video data, it is preferable to provide a 
sound output section, if contents are handled wherein 
text data is added to the video data, the text may be 
displayed on the display device 24, or may be output 
Zfro]^Th:e7s~6OTi^ 
wherein a program is attached, an attached program ex- 
ecution section may be provided. 
[0046] The contents storage unit 25 stores at least 
video data and special reproduction control information. 
In detail, as will be described later, in the case where 
the image data file is used, the image data file is further 
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stored The sound data : the text data, and the attached 
program are further stored In some cases. 
[0047] The contents storage unil 25 may be arranged 
atone location in a concentrated manner, or may be ar- 
ranged in a distributed manner, The point is that the con- 
tents can be accessed with the norma! reproduction 
processing unit 22 and special reproduction processing 
unit 23, The video data, special reproduction control in- 
formation, image data files, sound data, text data, and 
attached program may be stored in separate media or 
may be stored in the same medium. As the medium, for 
example, DVD is used. These may be data which are 
transmitted via a network. 

[0048] The controller21 basically receives an instruc- 
tion such as a normal reproduction and a special repro- 
duction with respect to the contents from the user via a 
user interface such as a GUI or the like. The controller 
21 controls for giving to the corresponding processing 
unil an instruction of reproduction by means of a method 
designated with respect to the designated contents. 
[0049] The normal reproduction processing unit 22 is 
used forthe normal reproduction of the designated con- 
tents.' 

[0050] The special reproduction processing unit 23 is 
used for the special reproduction (for example, a high 
speed reproduction, jump reproduction, trick reproduc- 
tion, or the like) of the designated contents by referring 
to the special reproduction control information. 
[0051] The display device 24 is used for displaying a 
video. 

[0052] The video reproduction apparatus can be real- 
ized by computer software. It may partially be realized 
by hardware (for example, decode board (MPEG-2 de- 
coder) or the like). The video reproduction apparatus 
may be realized as a dedicated device for video repro- 
duction. 



[0053] FIG. 7 shows one example of a reproduction 
processing procedure of the video reproduction appa- 
ratus of FIG. 6. At step S31, It is determined whether 
user requests a normal reproduction or'a special repro- 
duction. When a normal reproduction is requested, the 
designated video data is read at step S32 and a normal 
reproduction is conducted at step S33. When a special 
reproduction is requested from the user, the special re- 
production control information corresponding to the des- 
ignated video data is read at step S34, the location of 
the video data to be displayed is specified and the dis- 
play time is determined at step S35. The corresponding 
frame (group) is read from the video data (or the Image 
data file) at step S36 to conduct special reproduction of 
the designated contents at step S37.The location of the 
"tvTJeo.data can.b.e.spiecif^ 
determined at a timing different from that in FIG. 7. The 

* "7proceTarFQ^ * be * 
consecutively conducted for each frame information, or 
each processing may be conducted in batches. Other 
procedures can be conducted. For example, in the case 
of the reproduction method in which the display time of 



each frame is equally set to a constant value, it is not 
necessary to determine the display time. 
[0054] Both in the normal reproduction and in the spe- 
cial reproduction, the user may demand various desig- 

5 nations (for example, the start point of the reproduction 
or the end point of the reproduction in the contents, a 
reproduction speed in the high speed reproduction, and 
reproduction time in the high speed reproduction, and 
other method, such- as special reproduction or the like). 

10 [0055] Next, an algorithm for creating the frame infor- 
mation of the special reproduction control information 
and an algorithm for calculating the display time of the 
special reproduction will be schematically explained. 
[0056] At the time of creating the frame information, 

15 the frame information to be used at the time of the spe- 
cial reproduction is determined from the video data, the 
video location information is created, and the display 
time control Information Is created. 
[0057] The frame is determined by such methods as; 

20 t)a method for calculating the video frame on the basis 
of some characteristic quantity with respect to the video 
data (for example, a method for extracting the video 
frames such that the total of characteristic quantity (for 
example), the scene change quantity) between the ex- 

25 tracted frames becomes constant and a methodtor- ex- 
tracting the video frames such that the total of impor- 
tance between the extracted frames becomes con- 
stant), and (2) a method for calculating the video frame . 
on afixedstandard(forexample,amethodfor extracting 

30 frames at random, and a method for extracting frames 
at an equal interval). The scene change quantity is also 
called as a frame activity value. 
[0058] In the creation of the display time ppntrol infor- 
mation 121 , there are available; (i) a method for calcu- 

35 lating an absolute value or a relative value of the display 
time or a display frame number, (ii) a method for calcu- 



Tatingl^^re7ic¥lrWiWiatlon whTcHTsli base oTfhe dis^ - 
play time and a display frame number (for example, the 
information designated by the user, characters in the 
40 video, sound synchronized with video, and persons in 
the video, and the importance obtained on the basis of 
the specific pattern in the video), (Hi) a method for de- 
scribing both (i) and (ii). 

[0059] It is possible to appropriately combine (1) or 
45 (2) and (I), (ii) or (Hi). Needless to say, other methods 
can be possible. One specific combination out of such 
methods "can be used, and a plurality or combinations 
of these methods may be used and can be appropriately 
selected. 

so [0060] In a specific case, at the same time with the 
determination of the frame at the method (1), a relative 
valoe~of^henjispiay^ime~and-^he^umber-of-dlspla^ 
frames are determined, if this method is constantly 
tisedrit-is-possibleto-omitthe-display^irne-control-infor-- 

55 rnation processing unit 1 02. 

[0061] At the time of the special reproduction, it is as- 
sumed that the special reproduction is conducted by re- 
ferring to the display time control information 121 of (i), 
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(H) or (iii) included in the frame information. However, 
the described value may be followed or the described 
value may be correcied and used. In addition to Lhe de- 
scribed value and the corrected value thereof, Inde- 
pendently created other information, and information in- 
put from the user may be used. Alternatively, only the 
independently created other information and the infor- 
mation input from the user may be used. A plurality of 
methods out of these methods are enabled and can be 
appropriately selected. 

[0062] Next, an outline of the special reproduction wid 
be explained. 

[0063] A double speed reproduction (or a high speed 
reproduction) carries out reproduction in a time shorter 
than the time required for the normal reproduction of the 
original contents by reproducing a part of the frames out 
of the whole frames constituting the video data contents. 
For example, the frames Indicated by the frame infor- 
mation are displayed For each display time indicated by 
the display time control information 121, in the order of 
time sequence. Based on a request from the user, such 
as a speed designation request for designating at what 
times speed of the normal reproduction the original con- 
tents are reproduced (in what factor of the time required 
for the normal reproduction the original contents are re- 
produced) and a time designation request for designat- 
ing how much time is taken forreproducingthe contents, 
the display time of each frame (group) fs determined to 
satisfy the reproduction request. The high speed repro- 
duction is called a summarized reproduction. 
[0064] A jump reproduction (or a jump continuous re- 
production) is such that a part of the frame shown in the 
frame information is subjected to non-reproduction, for 
example, on the basis of the reproduction/non-repro- 
duction information described later in the high speed re- 
production. The high speed reproduction is conducted 
with" respect loltietfame excludinJffieTfame wTucTris" 
subjected to non-reproduction out of the frames shown 
in shown in the frame information. 
[0065] A trick reproduction excludes from the repro- 
duction except for the normal reproduction the high 
speed reproduction and the jump reproduction. For ex- 
ample, at the time of reproducing the frame shown in 
the frame information, there can be considered various 
forms such as a substituted reproduction for reproduc- 
ing a certain portion by replacing the order of time se- 
quence, an overlapped reproduction for reproducing a 
certain portion repeatedly a plurality of times at the time 
of reproducing the frame shown in frame information, a 
variable speed reproduction in which at the time of re- 
producing the frame shown in the frame information, a 
certain-p^i^ 

reproduction of ariotheTportio^^ 
which the portion is reproduced at the speed of normal 
reproduction, or the case in which the portion is repro- 
duced at a speed lower than the normal reproduction 
time) or at a speed higher than another portion, or the 
reproduction of a certain portion is temporarily suspend- 



ed, or such forms of reproduction arc appropriately com- 
bined, a random reproduction for reproducing at a ran- 
dom time sequence for each of a constant set of frames 
shown in the frame information. 
5 [0066] Needless to say, it is possible to appropriately 
combine a plurality of kinds of methods. For example, 
at the time of the double speed, the important portion is 
reproduced a plurality of times, and various variations 
are considered such as a method for setting a reproduc- 
10 tion speed to a normal reproduction speed. 

[0067] Hereinafter, embodiments of the present in- 
vention will be specifically explained in detail. 
[0068] In the beginning, the embodiments will be ex- 
plained by taking as an example a case in which a re- 
is production frame is determined on the basis of the 
scene change quantity between adjacent frames as the 
characteristic quantity of the video data, 
[0069] Here, there will be explained a case in which 
one frame is corresponded to one frame information. 
20 [0070] FIG. 8 shows one example of a data structure 
of the special reproduction control information created 
under the target video data. 

[0071 ] The data structure is such that the display time 
information 121 is described which is information show- 

25 ing an absolute or a relative display time as display time 
control information 102 in FIG. 1 (or instead of the dis- 
play time control information 102). A structure describ- 
ing the importance in addition to the display time control 
information 102 wiil be described later. 

30 [0072] The video location information 1 01 is informa- 
tion which enables the specification of the location in the 
original video frame of the video, and any of a frame 
number (for example, a sequence number from the first 
frame) or a number which specifies one frame in a 

35 stream like a time stamp may be used. If the video data 
corresponding to the frame extracted from the original 
video stream is set asaseparate^raf^ 
like may be used as information for specifying the file 
location. 

40 [0073] The display time information 121 Is information 
which specifies the time for displaying the video or the 
number of frames. It is possible to describe actual time 
or the number of frames as a unit and a relative value 
(for example, a normalized numeric vaiue) which clari- 
es ties a relationship of the relative time length with the dis- 
play time information described in other frame informa- 
tion. In the latter case, the actual reproduction time of 
each video is calculated from the total reproduction time 
as a whole. With respect to each video, the continuation 
so time of the display is not described, but such description 
with a combination of a start time starting from a specific 

-4im ing-(fQr^xam^ 

to 0), and the end time~an^ 

nation of the start time and the continuation time may 
55 be used. 

[0074] In the special reproduction, basically the repro- 
duction of the video present at a location specified with 
the video location information 101 only for the display 
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time specified with the display time information 121 is 
consecutively conducted only for the number of the. 
Hems of frame informalion Y included in the arrange- 
ment, such as shown in FIG. 8. 
[0075] If the start time and the end time orthe contin- 
uation time are specified and this designation is fol- 
lowed, the video present at the location specified with 
the video location information 101 is consecutively re- 
produced from the start time specified with the display 
time information 121 up to the end time or during the 
continuation time only for the number of items of the 
frame information T included in the arrangement. 
[0076] The described display time can be processed 
and reproduced by using parameters such as reproduc- 
tion rata information and additional information. 
[0077] Next, a method for describing the video loca- 
tion Information will be explained by using FIGS. 9 
through 11, 

[0078] FIG, 9 explains a method for describing the vid- 
eo location, information referring to the original video 
frame. 

[0079] In FIG. 9, a time axis 200 corresponds to the 
original video stream based on which the frame infor- 
mationforthe special reproduction is created and a vid- 
eo 201 corresponds to one frame which becomes a de- 
scription target in the video stream. A time axis 202 cor- 
responds to reproduction time of a video at the time of 
the special reproduction by using the video 201 extract- 
ed from the original video stream. A display time 203 is 
a section corresponding to one video 201 included in 
the display time 203. For example, the video location 
information 1 01 showing the location of the video 201 
and the video display time 121 showing the length of the 
display time 203 are described as frame information. As 
described above, the description on the location of the 
video 201 may be gi ven in any form such a s a fra me 



number, a time stamp or the like as long as one frame 
in the original video stream can be specified. This frame 
information will be described in the same manner with 
respect to the other videos 201 . 
[0080] * FIG. 10 explains a method for describing the 
video location information referring to the image data 
file, 

[0081] The method for describing the video location 
information shown in FIG. 9 directly refers to the frame 
in the original data frame which is to be subjected to the 
special reproduction. The method Tor describing the vid- 
eo location information shown in FIG. 10 is a method in 
which an image data file 300 comesponding to a single 
frame 302 extracted from the original video stream is 
created in a separate file, and the location thereof is de- 



handled in the same manner by using, for example, the 

"DflUWWliKe'b'offi^ 
on a local storage device and in the case where the file 
is present on the network. A set of the video location 
information 101 showing the Jocation otthis image data 
file and the video display time 121 showing the length 



of the corresponding display time 301 is described as 
frame infonnation. 

[0082] If a correspondencetotheoriginal video frame 
is required, the information (similar to the video location 
5 information in the case of, for example, FIG. 9) showing 
a single frame 302 of the original video corresponding 
to the described frame infonnation may be included In 
the frame information. The frame information may com- 
prise the video location information, the display time in- 
fo formation and the original video Information. When the 
original video information is not required, it is not re- 
quired to describe the original video, 
[0083] The configuration of the video data described 
with the method of FIG. 10 is not particularly restricted. 
15 For example, the frame of the original video may be 
used as it is or may be reduced. This Is effective for con- 
ducting a reproduction processing at a high speed be- 
cause it is not required to develop the original video. 
[0084] If the original video stream is compressed by 
20. means of MPEG-1 or MP EG-2 orthe like, a reduced vid- 
eo can be created at a high speed only by partially de- 
coding the streams. In this method, only the DCT (the 
discrete cosine conversion) coefficients of an I picture 
frame encoded within the frame (an inner-frame encod- 
25 ed frame) is decoded and a reduced video is<:ceatedby 
using the DCT coefficients. 

[0085] In the description method of FIG. 1 0, the image 
data files are stored in separate files. However, these 
files may be stored in a pack9ge ip a video data group 
3D storage file having a video format (for example, a motion 
JPEG) which can be accessed at random. The location 
of the video data is specified by a combination of the 
URL showing the location of the image data file, a frame 
number or a time stamp showing the location in the im- 
35 age data file. The URL information showingthe location 
of the Image data file may be described in each frame 
infonfiMioh"6TlTTay b~e aescfibed"a"s-additfoTia1 inf 
tion outside of the arrangement of thef rame information. 
[0086] Various methods can be taken to select the 
40 frame of the original video orthe like and create the vid- 
eo data to describe the video location information. For 
example, the video data may be extracted at an equal 
interval from the original video. Where the motion of the 
screen quite often appears, the video data is selected* 
in a narrow Interval. Where the. motion of the screen 
quite rarely appears, the video frame is selected in a 
wide interval. 

[0087] Here, referring to FIG, 11, there will be ex- 
plained a method In which as one example of a method 
for selecting frames, the frame is selected in a narrow 
interval where the motion of the screen quite often ap- 

p e ars-while-the--frame-is-seleGt0dJn--a~w-ideJnlervat 

"wh"er~e"the motlon'o^ appearsT 
-[0088]— ln-FIG-.-1"1ra-hori2ontal-axis-represents4he-se 



45 



50 



55 lected frame number, and' a curve 800. represents a 
change in the scene change quantity (between adjacent 
frames). A method for calculating the scene change 
quantity is the same as a method at the time of calcu- 



17 



EP 1 168 840 A2 



18 



laling the display lime described Inter. Here, in order to 
determine an extraction interval in accordance with the 
motion of the scene, there is shown a method for calcu- 
lating an interval at which the scene change quantity be- 
tween video frames from which the video data is extract- s 
ed becomes constant. The total of the scene change 
quantity between video frames from which the video da- 
la is extracted is set to S,, and the total of the scene 
change quantity in the whole frame is set to S (= ZS t ) 
while the number of data items to be extracted is ri. In 10 
order to set the video change quantity between video 
frames from which video data is extracted to a constant 
level. Sj = S/n may be provided. In FIG. 11 , the area S, 
ol the scene change quantity curve 800 divided with the 
broken lines becomes constant. Then, for example, the 15 
scone change quantity is accumulated from the extract- 
ed frame, so that the video rrame having the value ex- 
ceeding the S/n is set as the frame F l from which the 
video data is extracted. 

[0089] If the video data is created by I picture frame 20 
of MPEG, the video frame from which the calculated vid- 
eo data is created is not necessarily the I picture, the 
video data is created from the I picture frame in the vi- 
cinity thereof. 

[0090] By the way, in the method explained in FIG. 11 , 25 
the video- frame which belongs to the section of the 
scene change quantity = 0 is skipped. However, if a still 
picture continues, the scene is important in many cases. 
Then, if the scene change quantity = 0 continues for 
more than a constant time, the frame at that time may so 
be extracted. For example, the scene change quantity 
may be accumulated from the extracted frame so that 
the frame having the value exceeding S/n or the frame 
at which the scene change quantity = 0 continues for 
more than a constant time may be set as a frame Fj from 35 
whlch the video data is extracted. The accumulated val- 
ue ofThe scene'eh'angequantity may b"e~6Trnay noF&e 



cleared to 0. It is possible to selectively clear the accu 
mulated value based on a request from the user. 
[0091] In the case of an example of FIG. 11 , it is as- 40 
sumed that the display time information 121 is described 
so that the display time becomes the same with respect 
to any of the frames. When the video is reproduced in 
accordance with this display time information 121, the 
scene change quantity becomes constant. The display 45 
time information 121 may be determined and described 
in a separate method. 

[0092] Next, there will be explained a case in which 
one or a plurality of frames are ailowed to correspond 
to one frame information. so 
[0093] One example of the data structure of the spe- 

_c[akoprodueforwn^ 

thafin'FIG. 8. " " " 



[0094] Hereinafter, a method for describing the video 
location information will be explained by using FIGS. 12 55 
through 14. 

[0095] FIG. 12 explains a method for describing the 
video location information for referring to the continuous 



frames of the original video. 

[0096] A method for describing the video location in- 
formation shown in FIG. 9 refers to one frame 201 in 
one original video for conducting the special reproduc- 
tion. However, the method ror describing the video lo- 
cation information shown in FIG. 12 describes a set 500 
of a plurality of continuous frames in the original video. 
The set 500 of frames may include some frames extract- 
ed from the plural continuous frames within the original 
video. The set 500 of frames may include only one 
frame. 

[0097] If the set 500 of frames includes a plurality of 
continuous frames or one frame in the original video, the 
location of the start frame and the location of the end 
frame are described, or the location of the start frame 
and the continuation time of the set 500 are described 
in the description of the frame location (If one frame is 
included, for example, the start frame is set equal to the 
end frame). In the description of the location and the 
time, the frame number and the time stamp and the like 
are used which can specify frames In the streams. 
[0098] If the set 500 of frames is a p art out o f a plurality 
of continuous frames in the original video, information 
Is described which enables the specification of tho 
frames. If the method 1 for extracting the frames is deter- 
mined, and the specification of the frames can be spec- 
ified with the description of the locations of the start 
frame and the end frame, the start frame or the end 
frame may be described. 

[0099] The display time information 501 shows the to- 
tal display time corresponding to the whole frame group 
included in the corresponding frame set 500. The dis- 
play time pf each frame included in the set 500 of frames 
can be appropriately determined on the side of device 
for the special reproduction. As a simple method, there 
is a vailable a method in which the above total display 
tjmelsTe^ly^ivlcl'ea with the total number ofTrames" " 
in the set 500 to provide one frame display time. Various 
other methods are available. 

[01 00] FIG. 1 3 explains a method for describing video 
■ location information for referring to a set of the image 
data files. 

[0101] The method for describing the video location 
information shown in FIG. 12 directly refers to continu- 
ous frames in the original video to be reproduced. A 
method for describing the video location information 
shown in FIG. 13 creates a set 600 of the image data 
files corresponding to the original video frame set 602 
extracted from the original video stream in a separate 
file and describes the location thereof, in the method for 
describing the file location, the file can be handled in the 

even if "the file is present on a local storage device or 1T~ 
the file is present on a network. A set oif the video loca- 
tion information 101 showing the location of this image 
data file and the video display time 121 showing a length 
of the corresponding display time 601 can be described 
as the frame information. 
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[01 02) If a correspondence with the original frame is 
required, information showing the frame set 602 of the 
original video corresponding to the described frame in- 
formation (tor example, information similar to the video 
location information in the case of FIG. 12) may be in- 
cluded in the frame information. The frame information 
may comprise the video location information , the display 
time information and the original video information. The 
original video information is not required to be described 
when the infomnation is not required. 
[0103] The configuration of the video data, the prep- 
aration of the video data, the preparation of the reduced 
video, the method for storing the video data and the 
method for describing the location information such as 
the URL or the like are the same as what has been de- 
scribed above. 

[0104] Various methods can be adopted in the same 
manner as described above as to which frame of the 
original video is selected to create the video data to be 
described In the video location information. For exam- 
ple, the video data may be extracted at an equal interval 
from the original video. Where a motion of the screen 
quite often appears, a frame is extracted in a narrow in- 
terval. Where the motion of the screen rarely appears, 
a frame is extracted In a wide interval. 
[0105] In the above embodiments, the image data file 
300 is corresponded to the original video 302 in a frame 
to frame manner. It is po$sib!e to make the location in- 
formation of the frame described as the original video 
information have a time width. 
[01 06] FIG. 14 shows an examp le in which the original 
video information is allowed to have a time width with 
respect to the FIG. 8. An original video information 3701 
is added to the frame information structure shown in 
FIG. B. The original video information 3701 comprises 
a start point information 3702 a nd a section length in- 



formation 3703 which are the start point and the section 
length of the original video which is a target of the special 
reproduction. The original video Information 3701 com- 
prises any information which can specify the section of 
the original video having the time width. It may comprise 
the start point information and an end point Information 
In stead of the start point infonnation and the length in- 
formation. 

[01 07] FIG. 1 5 shows an example in which the original 
video information is allowed to have a time width with 
respect to the FIG. 9. In this case, for example, as video 
location information, display time information and origi- 
nal video information included in the same frame infor- 
mation, the location of the original video frame 3801 , the 
display time 3802, and the original video frame section 
la&QBjtflficft^ 
and the section length are described to show that these 
^orrespondto:^^ 

ative of the original video frame section 3803, the orig- 
inal video frame location 3801 described in the video 
location information is displayed. 
[01 08] FIG. 16 shows an example in which th e original 



information is allowed 1o have a time width with respect 
to the FIG. 1 0. In this case, for example, as video loca- 
tion information, display time information and original 
video information included in the same frame informa- 

5 tion, the location of the image data file 3901 for the dis- 
play, the display time 3902, and the original video frame 
section 3903 which comprises the start point (frame lo- 
cation) and the section length are described to show that 
these correspond to each other, 

10 [0109] That is. as a video representative of the original 
video frame section 3903, the image 3901 in the image 
data file described in the video location information is 
displayed. 

[0110] Furthermore, as shown in FIGS. 12 and 13, If 
15 a set of frames is used as a video for the display, a sec- 
tion different from the original video frame section for 
displaying the video may be allowed to correspond to 
the original video Information. 

[0111] FIG. 17 shows an example in which the original 
20 video information is allowed to have a time width with 
respectto the FIG. 12. In this case, for example, as video 
location information, display time information and origi- 
nal video information included in the same frame infor- 
mation, a set 4001 of frames in the original video, the 
25 display time 4002, and the original video frame section 
4003 which comprises the start point (frame location) 
and the section length are described to show that these 
correspond to each other. 

[0112] Atthis time, the section 4001 of a set of frames 
3a which are described as video location information, and 
the original video frame section 4003 which is described 
as the original video information are not necessarily re- 
quired tQ coincide with each other and a different section 
may be used for display. 
35 [01 13] FIG . 1 8 shows an example in which the original 
video information is allowed to have a time width with 
7es^ft~6T1TeTI'G7l^ 



location information, display time information and origi 
nal video information included in the same frame infor- 
40 mation , a set 41 01 of frames in the video file, the disptay 
time 4102, and the original video frame section 4103 
which comprises the start point (frame location) and the 
section length are described to show that these corre- 
spond to each other. 
45 [0114] Atthlstlme,thesectionofaset4101 offrames 
described as video location information, and the original 
video frame section 41 03 described as the original video 
are not necessarily required to coincide with each other. 
That is , the section of the set 41 01 of the frames for the 
so display may be shorter or longer than the original video 
frame section 4103. Furthermore, a video having com- 

pletely-different-contents^ay.bejncludedIhemirUiLaji^ 

dftiohV oh vn^rtlciJlarly" "i7SpoitaiTt"^eaio n"~maj^l5e~e3?-~ 

tracted-f romthe-section-described-in-the- origi nal -video - 

55 location as the image data file so that collected video 
' data is used. 

[0115] At the time of displaying the videos based on, 
for example, the summarized reproduction (special re- 
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produciion) using these ilems of the frame informalion, 
It may be desired lhal the corresponding frame in the 
original video is referred lo. 

[01 1 S] FIG, 1 9 shows a Flow for starting the reproduc- 
tion from the frame of the original video corresponding 
to the video frame displayed in special reproduction. At 
step S3601, the reproduction start frame Is specified in 
the special reproduction. At step S3602, the original vid- 
eo frame corresponding to the specified frame is calcu- 
lated with a method described later. At step S3603, the 
original video is reproduced from the calculated frames. 
[01 17] This flow can be used for referring to the-cor- 
responding location of the original video in addition to 
special reproduction. 

[0118] At step S3602, as one example of a method 
for calculating the corresponding original video frame, 
there is shown a method for using the proportional dis- 
tribution with respect to display time of the specified 
frame. The display Lime information included in the i-Lh 
frame information is setto Dj sec, the section start loca- 
tion of the original video information is set to t| sec, and 
the section length is setto d,- sec. If the location is spec- 
ified at which t sec has passed from the start of the re- 
production using the i-th frame information, the frame 
location of the corresponding original video is T = t. + d 
x t/Dj. 

[01 19] Referring to FIGS. 20 and 21 , as examples of 
a method for selecting a frame, there will be explained 
a method for extracting the frame in a narrow Interval 
where the motion ofthescreen quite often appears while 
extracting the frame in a wide Interval where the motion 
of the screen rarely appears in accordance with the mo- 
tion of the screen. The horizontal axis, the curve 800, 
and Sj and F, are the same as those In FIG. 11 , 
[01 20] In the example of FIG. 11 , the video data is ex- 
tracted one frame after another at an interval at which 
the ^scBne-ch-ange ^ trartlty r tetw^T^ 
which the video data is extracted is made constant. 
.FIGS. 20' and 21 show examples in which a set of a plu- 
rality of frames are extracted based on the frame Fj as 
reference. For example, as shown in FIG. 20, the same 
number of continuous frames may be extracted from F v 
The frame length 811 and the frame length 812 equal to 
each other. As shown in FIG. 21, the corresponding 
number of continuous frames may be extracted so that 
the total of the scene change quantity from Fj becomes 
constant. The area 813 and the area 814 equal to each 
other. Various other methods can be considered. 
[01 21] It is possible to use the frame selection method 
in which the frame is extracted when the scene change 
quantity = 0 continues for more than a constant time. 
[<M 22] —As Jaihe-cas e ^f-FIG.JJ^he-dispJay-timeJn-- 
formation 121 may fie "described solTiaUhe san^ "dis"~" 
play time may be provided with respect to any of frame 
sets in the cases of FIGS. 20 and 21. Alternatively, the 
display time information may be determined and de- 
scribed in a different method. 
[0123]. Next, one example of a processing for calcu- 



lating the display lime will be explained. 
[01 24] FIG . 22 shows one example of a procedure of 
the basic processing for calculating the display time so 
that the scene change quantity becomes constant as 
5 much as possible when the video described in the video 
location information is continuously reproduced in ac- 
cordance with time described in the display time infor- 
mation. 

[0125] This processing can be applied to a case in 
10 which the frames are extracted in any method. For ex- 
ample, if the frames are extracted in a method shown in 
FIG. 11, the processing can be omitted. Since the 
processing shown in FIG, 11 selects the frames such 
that the scene change quantity becomes constant when 
« the frames are displayed for a fixed time period. 

[0126] At step S71, the scene change quantity be- 
tween adjacent frames Is calculated with respect to all 
frames of the original video. If each frame of the video 
is represented in biL map, the differential value or the 
2° pixel between adjacent frames can be set to the scene 
change quantity. If the video is compressed with MPEG, 
the scene change quantity can be calculated by using 
a motion vector. 

[0127] One example of a method for calculating the 
25 scene change quantity will be explained. 

[0128] FIG. 23 shows one example of a basic 
processing procedure for calculating a scene change 
quantity of alj frames from the video streams com- 
pressed with MPEG. 
30 [01 29] At step SB1 , a motion vector is extracted from 
the P picture frame, The video frame compressed with 
the MPEG is described with an arrangement of I picture 
(an inner-frame encoded frame), p picture (an inter- 
frame encoded frame in a forward prediction), and B pic- 
as ture {an inter-frame encoded frame in a backward pre- 
diction), as shown In FIG. 24. The P picture includes a 
motion vector corresponding toa motion tromTKe~pre-~" 
ceding I picture or P picture. 

[0130] At step S82, the magnitude (intensity) of the 
40 each motion vector included in the frame of one.P pic- 
ture Is calculated, and an average thereof is set as a 

scene change quantity from the preceding I picture orP 
picture. 

[0131] AtstepS83, on the basis of the scene change 

45 quantity calculated with respect to the P picture, the 
scene change quantity is calculated for each one frame 
corresponding to the frame olherthan the P picture. For 
example, if the average value of the motion vector of the 
P picture frame is p, and the interval from the preceding 

50 i picture or P picture from which the video is referred to 
is d, the scene change quantity per one frame of each 

— - frame~is-set-to-p/d, ! 

[0132]"" Sub's¥qulinfly7at step ^ S72Tn"thT^ocedure~of " 
RG. 22, the total of the scene change quantity of frames 

55 between the following description target frames is cal- • 
culated from the description target frame described in 
the video location information. 

[0133] FIG. 25 describes a change in the scene 
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change quantity for each one frame, The hori7ontal axis 
corresponds to the Irame number while a cuive 1000 
denotes a change in the scene change quantity, if the 
display time of the video having the location information 
of the frame information F ( Is calculated, the scene 
change quantity in the section 1 001 up to F j+1 is added 
which corresponds to the frame location of the next de- 
scription target frame. It is considered that this becomes 
an area S, of the hatching portion 1 002, which is a mag- 
nitude of a motion of the frame location F,. 
[01 34] Subsequently, at step S73 in the procedure of 
FIG. 22, the display time of each frame is calculated. In 
ordertosetthe scene change quantity to a constant lev- 
el as much as possible, a larger quantity of the display 
time may only be allocated to the frame where the mo- 
tion of the screen is large, sothatthe ratio of the display 
time allocated to the video of each frame location F, to 
. the reproduction time may be set tD S/ZSj. When the 
total of the reproduction lime is set to T, the display lime 
of each video will be set to D { = T X Sj/SSj. The value of 
the total T of the reproduction time is defined as the total 
reproduction time of the original video. 
[01 35] If no scene change appears and Sj = 0, the low- 
er limit value (for example, 1) which Is calculated in ad- 
vance may be entered, or the frame information thereof 
may not be described. Even with respect to the frame 
where the screen change is very small even if Sj = 0 is 
not provided and virtually no change is displayed on the- 
actual reproduction, the lower limit value may be substi- 
tuted and no frame information may be described. If no 
frame information is described, the value of Sj may be 
added to or may not be added thereto. 
[0136] The processing forcalculatingthis display time 
can be conducted for the preparation of the frame infor- 
mation with the special reproduction control information 
creating apparatus, but the processing can be conduct 
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ed at the time of the special reproduction on the side of 
the video reproduction apparatus. 
[0137] Next, there will be explained a case in which 
the special reproduction is conducted. 
[0138] FIG. 26 shows one example for the N times 
high-speed reproduction on the basis of the special re- 
production control information that has been described. 
[0139] At step S111 , the display time D'| at the time of 
reproduction Is calculated on the basis of the reproduc- 
tion rate information. The display time Information de- 
scribed in the frame information is standard display time, 
the display time D\ = D/N of each frame is calculated 
when reproduction at N times high-speed is conducted. 
[01 401 At step S112, initialization for the display is 
conducted, and i = 0 is set so that the first frame Infor- 
IrnatioiTjOifplaye?~: 



[0141] At step S113, it is determined whether the dis- 

the threshold value of the preset display time. 
[01 42] if the display time is larger, the video location 
information included in the i-th frame information F t is 
displayed for D', seconds at step S114. 



[01 43] If the display time is not larger, the process pro- 
ceeds to step S115 to search the i-th frame information 
which is not smailerthan the threshold value in a forward 
direction. During search, the display time of the frame 
information which Is smaller than the threshold value of 
the display time is all added to the display time of the I- 
th frame information, The display time of the frame in- 
formation which is smaller than the threshold value of 
the display time is set to 0. The reason why such 
processing is conducted is that the time for preparing 
the video to be displayed becomes longer than the dis- 
play time when the display time at the time of reproduc- 
tion becomes very short with the result that the display 
cannot be conducted in time. Then, if the display time 
becomes very short, the process proceeds to the next 
step without displaying the video. At that time, this dis- 
play time of the video which is not displayed is added to 
the display time of the video to be displayed sothatthe 
total display time becomes unchanged. 
[0144] At step S116, it Is determined whether T is 
smaller than the total number of the frame information 
items in orderto determine whether or not the frame in- 
formation which is not displayed remains, If T is lower 
than the total number of tho frame information items, the 
process proceeds to step S117 to increment "i" by one 
to create for the display of the next frame Information. 
When T reaches the total number of the frame informa- 
tion items, the reproduction processing is completed. 
[01 45] FIG. 27 shows one example for con ducting the 
N times high-speed reproduction on the basis of the de- 
scribed special reproduction control Information by tak- 
ing the display cycle as a reference. 
[0146] At step S1 21 , the display time D'j of each frame 
is calculated as D\ = D/ N at the N times high-speed 
reproduction. Here, the calculated display time is actu- 
ally associated with the display cycle so that the video 
'^aTih^Tbe^waT^ 
[0147] FIG. 28 shows a relationship between the cal- 
culated display time and the display cycle. The tirne axis 
1300 shows the calculated display time while the time 
axis 1 301 shows the display cycle based on the display 
rate. If the display rate Is f frame/sec, an interval of the 
display cycle becomes 1/f sec. 
[0148] Consequently, at step S122, the frame infor- 
mation F, Including the start point of the display cycle Is 
searched while the video included in the frame informa- 
tion F{ is displayed for one display cycle (1/f sec) at step 
S123. 

[0149] For example, the display cycle 1302 (FIG. 28) 
displays the video of the frame information correspond- 
ing to this display time because the display start point 

^iaoa-is-included-in4he-calculated.displayJimeJJiD.4.. 

p^50j~~ ATnethod foTallowiri^ cycle borre-~~ 

spondto-the-frame-information-may-display-the-video-at— 

55 the nearest location of the start point of the display cycle, 
as shown in FIG. 29. If the display time becomes smaller 
than the display cycle like the display time 1 305 of FIG. 
28, the display of the video may be omitted. If the video 
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is forcibly displayed, the display lime before and after 
the video is shortened to adjust so that the total display 
lime becomes unchanged. 

[0151] At step S124. it is determined whether the cur- 
rent display is the final display or not. If the current dis- 
play is the final display, the processing is completed. If 
the display is not the final display, the process proceeds 
to step S125 to conduct the processing of the next dis- 
play cycle. 

[0152] FIG. 30 shows another example of a data 
structure for describing the frame information. The 
frame information included in the data structure of FIG. 
8 or FIG. 14 summarizes a single original video. A plu- 
rality of original videos can be summarized by expand- 
ing the frame information. FIG. 30 shows such an ex- 
ample. An original video location information 4202 for 
indicating the original video Tile location Is added to the 
original video information 4201 included In the individual 
frame information. The file described in the original vid- 
eo location information 4202 is not necessarily required 
to handle the entire file. The file can be used in the form 
in which only a portion of the section is extracted, In this 
case, not only file information such as a file name or the 
like but also the section information showing which sec- 
tion of the file becomes an object are additionally de- 
scribed. Plural sections may be selected from the orig- 
inal video. 

[0153] Furthermore, if several kinds of the original vid- 
eos are present and identification Information is individ- 
ually addedtp the videos, the original video identification 
information may be described in place of the original vid- 
eo location information. 

[0154] FIG. 31 explains an example in which a plural- 
ity of original videos are summarized and displayed by 
using the frame information added with the original vid- 
eo location information. In this example, three videos 
are summarizedTo display one surWarized" video rWiflT 
respect to the video 2, in place of the whole section , two 
sections 4301 and 4302 are taken out to handle the re- 
spective videos. As the frame information, together with 
these original video information, the frame iocation 
(4303 with respect to 4301 ) of respective representative 
video is described as the video location information 
while the display time (4304 with respect to 4301) is de- 
scribed as the display time information. 
[0155] FIG. 32 explains another example In which a 
plurality of. original videos are summarized and dis- 
played by using the frame information added with the 
original video location information. In this example, 
three videos are summarized to display one summa- 
rized video. With respect to the video 2, In place of the 
vvhole-section-ra-fjortf o n-of-the -section-is -taken- o ut-A- 
piurality of s e"ctiohs~m ay" be taken "olif asliescribedTn - 
FIG, 31 . As the frame information, together with these 
items of the original video information (for example, the 
section information 4401 in addition to the video 2), the 
storage location of respective representative video files 
4402 is described as the video location information and 



the display time 4403 is described as display lime infor- 
mation. 

[01 5S] Addition of the original video location informa- 
tion to the frame information which has been explained 
5 jn these examples can be applied completely in the 
same way to the case in which a set of frames is used 
as video location information with the result that a plu- 
rality of original videos are summarized and displayed. 
[0157] FIG. 33 shows another data structure for de- 
10 scribing the frame information. In this data structure, in 
addition to the video location information 101, the dis- 
play time information 121 and the original video infor- 
mation 3701 which has been already explained, a mo- 
tion information 4501 and interest region information 
is 4502 are added. The motion information 4501 describes 
a magnitude of a motion {a scene change quantity) in a 
section (the section described in the original video infor- 
mation) of the original video corresponding to the frame 
information. The in te res L region information 4502 refers 
to a description of the infonnation which should be par- 
ticularly interested in the video which is described in the 
video location information. 

[0158] The motion information can be used for calcu- 
lating the display timo of the video described in tho video 
location infonnation as used at the time of calculating 
the display time from the motion of the video, as shown 
in FIG. 22. In this case, even when the display time in- 
formation is omitted and only the motion information is 
described, special reproduction such as high-speed re- 
production can be conducted In the same manner as in 
the case in which the display time is described. In this 
case, the display time is calculated at the time of repro- 
duction. 

[0159] Both the display time information and the mo- 
tion information can be described at the same time. In 
that case, an application for displaying uses the required 
"one ot me rw67oTuses~5oTri TrT combination in accord" 
ance with the processing. 

[0160] For example, the display time calculated irre- 
spective of the motion is described in the display time 
information. A method for calculating the display time 
for cutting out important scenes from the original video 
corresponds to this. At the time of the high-speed repro- 
duction of the summarized contents calculated in this 
manner, the motion information Is used so that a portion 
with a large motion is reproduced slowly while a portion 
with a small motion is reproduced quickly with the result 
that a high-speed reproduction free from a large over- 
look is enabled. 

[0161] The interest region information is used when 
the particularly interest region is present in the video de- 
scribed to^he^ideo- location^ 
formation. For example, faces of persons who seem "to 
be important correspond to this. At the time of displaying 
the video including such interest region information, the 
display may be conducted by overlapping a square 
frame so that the interest region can be easily detected. 
The frame display is not indispensable, and the video 



25 



30 



35 



40 



45 



50 



27 



EP 1 168 840 A2 



28 



may only be displayed as II is, 
[01 62] The interest region information can be used for 
processing and displaying thespecial reproduction con- 
trol information such as frame information or the like. 
For example, If a part of the frame information is repro- 
duced and displayed, the frame information including 
the interest region information is displayed with priority. 
Further, it is assumed that the frame information includ- 
ing square area with large area has higher importance, 
thereby making it possible to selectively displaying he 
video. 

[01 63] As shown above, there has been explained an 
example in which the processing is conducted on the 
basis of the scene change quantity. Hereinafter, there 
will be explained a case |n which the importance infor- 
mation is used. 

[01 64] FIG. 34 is a view showing examples of a data 
structure of the frame Information attached to the video. 
[0165] An importance information 122 is described in 
addition to or in place of the display time control infor- 
mation 1.02 in the data structure of the frame information 
of FIG. 1 . The display time is calculated based on the 
importance information 122. 

[0166] The importance Information 122 represents 
the importance of the corresponding frame (or a set of 
frames). The importance is represented, for example, 
as an integer in a constant range (for example, 0 to 100), 
or is represented as an actual number in a constant 
range (for example, 0 to 1). Otherwise, the importance 
information 122 maybe represented as an integer or an 
actual number value without setting the upper limit. The 
importance information 122 may be attached to all. the 
frames of the video, or only the frame in which the im- 
portance is changed. 

[0167] in this case as well, it is possible to take any 
form of FIGS. 9, 10, 12, and 13. The frame extraction 



method of FIGS. 11, 20, and 21 can be used. In "this 
case, the scene change quantity of FIGS. 11 , 20, and 
21 may be replaced by the importance. 
[01 68] Next, in the example which has been explained 
above, the display time is set with the scene change 
quantity. However, the display time may be set by the 
importance Information. Hereinafter, the method for set- 
ting the display time will be explained. 
[0169] In the setting the display time on the basis of 
the scene change quantity exemplified above In order 
to understand the video contents well, the display lime 
is set long where the change quantity is large and the 
display time is set short where the change quantity is 
small. In the setting of the display time on the basis of 
this importance, the display time is set long where the 

~mpWa^ 

where the importance is low. That is, since the method 

"~7foreettlfrg:the"dls'p^ 

is basically similar to the method for setting the display 
time based on the scene change quantity, the method 
will be briefly explained. 

[0170] FIG. 36 shows one example of the basic 



processing procedure in this case. 
[0171] At step S191 , the importance of ail frames of 
the original video will be calculated. A concrete method 
thereof will be exemplified later. 
5 [0172] At step S192, the total of the importance from 
the description object frame described in the video lo- 
cation information to the next description object frame 
will be calculated. 

[0173] FIG. 37 describes the change in the irnpor- 
10 tance for each one frame. Reference numeral 2200 de- 
notes the importance. If the display time of the video 
having the location information of the frame information 
F| is calculated, the importance in the section up to F ]+1 
which is the next description object frame location is ac- 
ts cumulated. The accumulation result is an area S\ of the 
hatching portion 2202. 

[0174] At step S1 93, the display time of each frame is 
calculated, Suppose that the ratio of the display time al- 
located to the video at each frame location Fj the repro- 
20 duction time is set to S\/IS). When the total of the re- 
production time is set to T, the display time of each video 
becomes Dj = T x S'/S'j. The value of the total T of the 
reproduction time is a standard reproduction time to be 
regulated as the total reproduction time of the original 
25 video. ' ' 

[0175] When the total of the importance becomes S\ 
= 0, the preset lower limit value (for example, 1 ) may be 
described, or the frame information may not be de- 
scribed, Even if S l , = 0 is not established but the imp or- 
30 tance Is very small, and it is assumed that such a frame 
is virtually not displayed, the lower limit value may be 
described or the frame information may not be de- 
scribed. If the frame information is not described, the S'j 
value may be added and may not be added to S 1 ^, 
35 [0176] As shown in FIG. 34, in the data structure of 
the frame information of FIG, 1 , the video location infor- 

rfiaHon'TOTn^I^l^tiTO irtfOTftiHttorrt 21~and theim=- 

portance information 112 may be described in each 
frame information -T. At the time of the special repro- 
40 duction, the display time information 121 is used but the 
importance Information 122 is not used; the importance 
information 122 is used but the display time information 
121 is not used; both the importance information 122 
and the display time information 121 are used; and nei- 
45 therthe importance information 122 northe display time 
information 121 Is used. 

[0177] The processing of calculating the display time 
can be conducted for preparing the frame information 
with the special reproduction control information creat- 
50 jng apparatus. However, the processing may be con- 
ducted on the side of the video reproduction apparatus 
■aHhe-time-of4h e- sp ecial -rep rod uctio n 



[0178] " NVxtTameth^ Sf of FIGT " 

S6)-for-calculating"the-importance-of-eaeh-frame-oP'the- 

55 scene (video frame section) will be explained. 

[0179] Since various factors are normally intertwined 
in the judgment as to a certain scene having a video is 
important, the most appropriate method for calculating 
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the importance is a method in which man delermines 
the importance. In this method, importance evaluator 
evaluates the importance for each scene or the video, 
or for each of the constant interval, so thai the impor- 
tance is input as the importance data. The importance 
data referred to here refer to a frame number or lime 
and a correspondence table with the importance value. 
In order to avoid subjective evaluation of importance, a 
plurality of Importance evaluators are allowed to evalu- 
ate the same video to calculate the average value (or a 
median or the like will do) for each scene or each video 
frame section so thai the importance is finally deter- 
mined. In such manual Input or the importance data, it 
is possible to add vague expressions and a plurality of 
elements which cannot be expressed in words to the im- 
portance. 

[01 80] In order to omit the trouble of determination by 
man, it is preferable that a phenomenon is expected in 
which a video scene which seems to be Important is like- 
ly to appear, and the processing is used which automat- 
ically evaluates such phenomenon to convert the phe- 
nomenon into importance. Here, some examples are 
shown in which importance is automatically created. 
[01 81] FIG. 38 shows an example of a processing pro- 
cedure at the time of automatically calculating important 
data on the basis of the idea that a scene having a targe 
sound level is important. FIG. 38 is established as a 
function block diagram. 

[0182] In the sound level calculation processing at 
step S210, the sound level at each time is calculated out 
when thesound level attached to the video is calculated. 
Since the sound level largely changes in an instant, the 
smoothing processing or the like may be conducted in 
the sound level calculation processing at step S210. 
[0183] In the importance calculation processing at 
step S211 , a processing is conducted for converting Into 
Th~e~ im ponanclRliesoi una leWbljfpuT as^resi^ 
sound level calculation processing. For example, the 
sound level input is linearly converted into a value of 0 
to 100, the sound level having the lowest sound level 
set in advance being set to 0, and having the highest 
sound level being set to 1 00. The sound level not more 
than tfie lowest sound level is set to 0 while the sound 
level not less than the highest sound level is set to 1 00. 
As a result of the importance calculation processing, the 
importance at each time is calculated to be output as 
importance data. 

[0184] FIG. 39 shows an example of a processing pro- 
cedure of a method for automatically calculating another 
importance level. FIG. 39 is established as a function 
block diagram. 

J°i 8 ?^ n :?[? CGSSln ^ 

the scene in which important won5s~regTst^^^ 

vance in the sound attached to the video quite often ap- . 

pear is important. 

[01 8B] In the sound recognition processing at step 
S220, when the sound data attached to the video is in- 
put, the language (words) man talks is converted into 



text data in the sound recognition processing. 
[0187] In the important word dictionary 221, words 
which are likely to appear in important scenes are reg- 
istered. If the degree of importance of registered words 
5 differs, the weight is added to each of the registered 
words. 

[0188] In the word collation processing at step S222, 
the text data which is an output of the sound recognition 
processing is collated with the words registered in the 
io important word dictionary 221 to determine whether or 
not important words are talked. 
[0189] In the importance calculation processing at 
step S223, the importance in each scene of the video 
or at each time is calculated from the result of the word 
is collation processing. In this calculation, the number of 
the appearances of important words and the weight of 
the important words are used so that the processing Is 
conducted to increase the importance around the time 
at which, for example, important words have appeared 

20 (or of the scene in which the important words have ap- 
peared) by a constant value, or a value proportional to 
the weight of the important words. As a result or the im- 
portant calculation processing, the importance at each 
time is calculated to be output as importanco data. 

25 [0190] If the weight of all the words is set to the same, 
the important word dictionary 221 becomes unneces- 
sary. This is because that it is assumed that the scene 
in which many words are spoken is important. At this 
time, in the word collation processing at step S222, the 

3Q processing of countingthe number of words output from 
the sound recognition processing is conducted. Not only 
the number of words but also the number of characters 
may be counted. 

[0191] FIG. 40 shows an example of a processing pro- 
35 cedure of the method for automatically calculating the 
other importance level. FIG. 40 is also established as a 
furicil on^rdcl<"diagram. 

[0192] The processing of FIG. 40 determines thatthe 
scene in which many important words appear wh ich are 
40 registered in advance in the telop appearing in the video 
is important. 

[0193] In the telop recognition processing at step 
S230, the character location in the video is specified to 
recognize characters by converting the video region at 

45 the character location into a binary value. The recog- 
nized result Is output as text data. 
[01 94] The important word dictionary 231 is the same 
as the important word dictionary 221 of FIG. 39. 
[01 95] In the word collation processing at step S232, 

so in the same manner as at step S222 in the procedure of 
FIG. 39, the text data which is an output of the telop 

^77^ecognttion-proeessing-is collated with the-words-r egis 
tered in the important word dictionary 231 to determine 
whether or not important words have appeared. 

55 [0196] In the importance calculation processing at 
step S232, the importance at each scene or at each time 
is calculated from the number of appearances of impor- 
tant words, and weight of the important words in the 
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same manner as at step S223 in the procedure of FIG. 
39. As a result of the importance calculation processing, 
the importance at each time is determined to be output 
as Importance data, 

[0197] If the weight of all the words is set to the same, 
the important word dictionary 231 becomes unneces- 
sary. This Is because that it is assumed that the scene 
in which many important words appear is an important 
scene. At this time, in the word collation processing at 
step S232, processing is conducted for counting the 
number of words simply output from the telop recogni- 
tion processing. Not only the number of words but also 
the number of characters may be counted. 
[0198] FIB. 41 shows an example of a processing pro- 
cedure of a method for automatically calculating still an- 
other importance level. FIG. 41 is established as a f u no- 
tion block diagram. 

[0199] The processing of FIG. 41 determines that 
when the telop appearing in the video Is in larger char- 
acter size, the scene is more important 
[0200] In the telop detection processing at step S240, 
the processing Is conducted for specifying the location 
of character string in the video. 
[0201] In the character size calculation processing at 
step S241 , individual characters are extracted to calcu- 
late the average value or the maximum value of the size 
(area) of the character. 

[0202] In the Importance calculation processing at 
step S242, the Importance Is calculated which is propor- 
tional to the size of the character which is an output of 
the character size calculation processing. If the calcu- 
lated importance is too largeortoo small, the processing 
is conducted for restricting the importance to a preset 
range with the threshold value processing. As a result 
of the importance calculation processing, the impor- 
_tance_at each time is calculated to be output as im por- 
tance data. 

[0203] FIG. 42 shows an example of the processing 
procedure of a method for automatically calculating still 
another importance level. FIG. 42 is established as a 
function block diagram. 

[0204] The processing of FIG. 42 detemnines that the 
scene in which human faces appear in the video is Im- 
portant. 

[0205] In the face detection processing at step S250, 
the processing is conductedfor detecting an area which 
looks like a human face in the video. As a result of the 
processing, the number of areas (number of faces) 
which are determined to be a human face Is output. The 
information on the size (area) of the face may be output 
at the same time, 

step S251 , the number of faces which is an output of the 

• — ~ proe^ssin.g pf'fl^ 

times to calculate the importance. If the output of the 
face detection processing includes face size informa- 
tion, calculation is conducted so that the importance in- 
creases with an increase in the size effaces. For exam- 



ple, the area of the face Is multiplied by several times to 
calculate the importance. As a result of the importance 
calculation processing, the importance at each time is 
calculated to be output as importance data. 
5 [0207] FIG. 43 shows an example of the processing 
procedure of a method for automatically calculating still 
other importance level. FIG. 43 is also established as a 
function block diagram, 

[0208] in the processing of FIG. 43, it is determined 
10 that the scene In which a video similar to the video which 
is registered in advance appears Is important. 
[0209] The video which should be determined to be 
Important is registered In the important scene dictionary 
260. The video is recorded as raw data or is recorded 
15 in a data compressed form. Instead of the video itself, 
the characteristic quantity (a color histogram, afrequen- 
cy or the like) of the video may be recorded. 
[0210] In the slmiiarity/non-similarlty calculation 
processing at step S261, similarity/no n -similarity be- 
20 tween the video registered in the important scene dic- 
tionary 260 and the input video data is calculated. As 
the non-similarity, the total of the square error or the total 
of the difference in the absolute value is used. If the vid- 
eo data is recorded in the important scene dictionary 
25 260, the total of the square error for each of th&.corre- 
sponding pixels and the total of the differential. of the ab- 
solute valued are calculated as non-similarity. If the 
color histogram of the video is recorded in the important 
scene dictionary 260, the same color histogram is cal- 
30 culated with respect to the input video data to calculate 
the total of the square error between histograms and the 
total of the difference in the absolute values to setthese 
totals as npn-simi.!arity. 

[0211] In the importance calculation processing at a 
35 step S262, the importance is calculated from the simi- 
larity/ non-similarity which is an output of the similarity 
"and" mm^imilanf^^ 



tance is calculated in such a mannerthat larger similarity 
provides greater importance if the similarity is input 
40 while larger non-similarity provides smaller importance 
if the non-similarity is input. As a result of the importance 
calculation processing, the' importance at each time is 
calculated to be output as the importance data. 
[0212] Furthermore, as another method for automat- 
es ically calculating the Importance, the scene having a 
high instant viewing rate is set as an important scene. 
The data on the instant viewing rate is obtained as a 
result of the summing of the viewing rate investigation, 
so that importance is calculated by multiplying the in- 
50 stant viewing rate by constant times. Needless to say, 
there are various other methods. 

[0213] — The importance-calculation prdcessing.mayJ3B 

solely conducted", or a plurality "of data itemiTmay be" 

use d-at-the-same-time-to-calGUlate^he-inriportance^in 

55 the latter case, for example, the importance of one video 
is calculated with several different methods to calculate 
the final importance as an average value or a maximum 
value. 
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[0214] In the above embodiment, Ihe explanation has 
been given by citing the scene change quantity and the 
importance. However, it is possible to use one item or 
information or a plurality of Items of information (de- 
scribed in the frame information) together with the scene 5 
change quantity or the importance or instead of the 
scene change quantity or importance. 
[0215] Next, there will bo explained a case jn which 
information for the control of reproduction/non-repro- 
duction is added to the frame information (see FIG. 1). 10 
[021 6] It js desired that either only a specific scene or 
a part thereof (for example, a high-light scene) or only 
a scene or a part thereof in which a specific person ap- 
pears is reproduced. Thus, there is a demand of watch- 
ing only a portion of the video. 15 
[0217] in order to satisfy this desire, the reproduction/ 
non-reproduction- information may be added to the 
frame information for controlling the reproduction or the 
non-reproduclion. As a consequence, only a part of the 
video is reproduced or only a part of the video is not 20 
reproduced on the basis of the rep rod uction/non -repro- 
duction information. 

[0218] FIGS. 44 f 45, and 46 show examples of a data 
structure in which the reproduction/no n-rep reduction in- 
formation is added. 25 
[021 9] FIG. 44 shows a data structure in which the re- 
production/non-reproduction information 123 is added 
to the data structure of FIG. 8. FIG. 45 shows a data 
structure in which the reproduction/non-reproduction in- 
formation 1 23 is added to the data structure of FIG. 34. so 
FIG. 46 shows a data structure in which the reproduc- 
tion/non-reproduction information 123 is added to the 
data structure of F!G S 35, Though not shown, it is pos- 
sible to add the reproduction/non-reproduction informa- 
tion 123 to the data structure of FIG. 1 . ss 
[0220] The reproduction/non-reproduction informa- 

"tlon T23~mayM t5Tn^iyW67^ 
the video is reproduced or not or a continuous value 
such as reproduction level orthe like. 
[0221] For example, in the latter case, when the re- 40 
production level exceeds a certain threshold value at the 
time of reproduction, the video is reproduced. When the 
reproduction ievel is less than the threshold value, the 
video is not reproduced. The user can directly or indi- 
rectly specify the threshold value. 45 
[0222] The reproduction/non-reproduction informa- 
tion 123 may be set as independent information to be 
stored. If the reproduction or non-reproduction isselec- 
tiveiy specified, the non-reproduction can be specified 
when the display time shown in the display time infor- so 
mation 121 is set to a specific value (for example, 0 or 

^IJrAlternatively.^he'Ron-reproduction-can-be-specified- 
wheri the importance indicated by the Importance infor" 
mation 122 is set to a specific value (for example, 0 or 
-1). The reproduction/non-reproduction information 1 23 55 
may not be added. 

[0223] If thereproductionornon-reproduction isspec- 
ified with a level value, the display time information 121 



and/or Ihe importance information 122 (represented by 
the level value) can be used as a substitute. 
[0224] if the reproduction/non-reproduction informa- 
tion 123 is maintained as independent information, the 
quantity of. data Increases by that quantity. It is possible 
to see a digest of the video by allowing the non-repro- 
duction specification portion not to be reproduced on the 
reproduction side, It is also possible to see the whole 
video by reproducing the non-reproduction specified 
portion. If the reproduction/ non-reproduction informa- 
tion 123 is not maintained as independent information, 
it is necessary to appropriately change the display time 
specified, for example, as 0 in order to see the whole 
video by reproducing the non-reproduction specified 
portion. 

[0225] The reproduction/non-reproduction informa- 
tion 123 may be input by man or may be determined with 
some conditions. For example, when the motion Infor- 
mation of the video Is set to a constant value or more, 
the video is reproduced. When the motion information 
of the video is not set to a constant value or more, the 
video is not reproduced so that only brisk motion portion 
can be reproduced. When it is determined that the skin 
color is larger or smaller than the constant valuo from 
color information, only the scene where man appears 
can be reproduced. A method for calculating the infor- 
mation with the magnitude of sound, and a method for 
calculating the information from the reproduction pro- 
gram information which is input in advance can be con- 
sidered. The importance may be calculated with some 
technique to create the reproduction/non-reproduction 
information 123 from the importance information. When 
the reproduction/non-reproduction information is set to 
a continuous value, the importance may be calculated 
by converting the information into the reproduction/non- 
reproduction information. 

"[0226]" ^iGT^sTTows^rrexample in which reproduc~ 
tion/ non-reproduction control is carried out so that video 
is reproduced "on the basis of the reproduction/non-re- 
production information 123. 

[0227] In FIG. 47, it is supposed that the original video 
2151 is reproduced on the basis of the video frame lo- 
cation information represented with F 1 through F 6 orthe 
video frame group location information2153andthe dis- 
play time information represented with D 1 through D 6 . 
At this time, it Is supposed that the reproduction/non- 
reproduction information is added to the display time in- 
formation 2154. )n this example, the sections of D 2l 
D 4 and D 6 can be reproduced, and othersections cannot 
be reproduced, the sections of D 1 , D 2> D 4 and D e are 
continuously reproduced as the reproduction video 

— 2-1-52-{whi le oth er-seetions-ean not- b e repro du ced) ; 

[0228] For example, in the frame F, of the reproduc- 
tion video, if the display time is set to D+j when the re- 
production/non-reproduction information 123 shows re- 
production, and the display time is set to D' } when the 
reproduction/non-reproduction information 123 shows 
"the non-reproduction, EjD+j = V when the total time of 
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the reproduction portion of the original video is set to T\ 
Normally, the display time of D + | is set to a time which 
is required to reproduce the original video at a normal 
speed. The reproduction speed may be set to a prede- 
termined high-speed. Information may be described as 5 
to which times the speed is to be set. When it is desired 
that the video is reproduced at N times high-speed, the 
display time D+, of the reproduction portion is multiplied 
by 1/N times. For example, in order to perform repro- 
duction at the predetermined time D', the display time 10 
D~| ot each reproduction portion may be processed and 
displayed at D'/^D+j times. 

[0229] If the display time of each frame (or a frame 
group) is determined on the basis of the frame informa- 
tion, the determined display time may be adjusted. * 5 
[0230] In a method in which the calculated display 
time is not adjusted, the display time which is calculated 
without taking into consideration the generation of the 
non-reproduction section is used as it is, so that when 
the display time exceeding 0 is originally allocated to the 20 
non-reproduction section the whole display time is 
shortened for that allocation portion. 
[0231] In a method in which the calculated display 
lime is adjusted, for example, if the display time exceed- 
ing 0 is originally allocated to the non-reproduction sec- 25 
lion, the adjustment is made by multiplying by a constant 
number the display time of each of the frames (or the 
frame group) to be reproduced so that the whole display 
time becomes equal to thetime at the time of the repro- 
duction of the non-reproduction section. 30 
[0232] The user may make a selection as to whether 
the adjustment is to be made. 

[0233] If the user specifies the N times reproduction, 
the N times high-speed reproduction processing may be 
conducted without the adjustment of the calculated dis- 35 

play time. The IM times high-speed, reprod uction 

processing may be conducted on the basis of the display 
time after the adjustment of the calculated display time 
in the above manner (the display time of the former be- 
comes shorter). 40 
[0234] The user may specify the whole display time. 
In this case as well, for example, the display time of each 
frame (or a frame group) to be reproduced is multiplied 
by a constant numberto make an adjustment so that the 
display lime becomes equal to the specified whole dis- 45 
play time. 

[0235] FIG. 48 shows one example or the processing 
procedure for reproducing only a portion of the video on 
the basis of the reproduction/non-reproduction informa- 
tion 123. 50 
[0236] At step S1 62, the frame information (video lo 



that the reproduction is not to be conducted, the frame 
is not displayed and the processing is moved to the next 
frame processing. 

[0238] It is determined at step S161 whether or not 
the whole video to be reproduced is processed. When 
the whole video is processed, the reproduction process- 
ing is also ended. 

[0239] When it is determined that the frame is to be 
reproduced or not at step S163, It is desired in some 
cases that the determination is depending on the taste 
of the user. At this time, it is determined from the user 
profile whether or notthe non-reproduction portion is re- 
produced in advance before the reproduction of the vid- 
eo. When the non-reproduction portion is reproduced, 
the frame Is reproduced without fail at step S164. 
[0240] In addition, when the reproduction/non-repro- 
duction information Is described as a continuous value, 
a threshold value is determined from the user profile for 
differentiating the reproduction and the non-reproduc- 
tion to determine the reproduction or the non-reproduc- 
tion depending on whether or notthe reproduction/non- 
reproduction information, exceeds the threshold value. 
Except for using the user profile, for example, the 
threshold value is calculated from the importance set for 
each frame, or information may be received in advance 
from the user as to whether the reproduction or non-re- 
production is provided in real time. 
[0241] In this manner, it becomes possible to repro- 
duce only a portion of the video by adding to the frame 
information the reproduction/non-reproduction informa- 
tion 123 for controlling whether the video is reproduced 
or not with the result that it becomes possible to repro- 
duce on ly th e h igh-l ight sce.ne o r on ly the scene in w h ich 
a man or an object of interest appears. 
[0242] Next, there will be explained a describing 
method ifthe location infomnation of media (forexample, 



cation InfoirSalEiTA^ 

to determine whether the frame is to be reproduced from 

TfilTTepTpflucT:]^ 

display time information at step S 1 63. 55 
[0237] When it is determined that the reproduction is 
to be conducted, the frame is displayed for the portion 
of the display time at step S164. When it is determined . 



text bTsoTihliy^herlharT ffieviaeo assoclaTed With1h _ e~ 
video to be displayed, and time for displaying or repro- 
ducing the video is added to the frame information (see 
FIG. 1) as additional information. 
[0243] In FIG. 8, the video location Information 101 
and the display time information 102 are included In 
each frame information 100. In FIG. 34, the video loca- 
tion information 101 and importance information 122 are 
Included in each frame information 100. In FIG. 35, the 
video location information 1 01 , the display time Informa- 
tion 121, and importance information 122 are included 
in each frame information 100. In FIGS. 44, 45, and 46, 
there is further shown an example in which the repro- 
duction/non-reproduction information 123 is included in 
each frame information 1 00. In any example, 0 or more 
-sounoHoeation-informatio^ 

~timeTnfonmatjon~270^0 orTnore text information 2705 
-"and^ext'display-time-information-2^06-(howevev-1-or- 
more in any of the information) may be added. 
[0244] FIG. 49 shows an example in which one set of 
sound location information 2703 and sound reproduc- 
tion time information 2704 and N sets of text information 
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2705 and texl display lime information 2706 are added 
lo an example of the data structure of FIG. 8. 
[0245] The sound is reproduced for the time indicated 
by the sound reproduction time information 2704 from 
the location indicated by the sound location information 
2703. An object of reproduction may be sound informa- 
tion attached to the video from the beginning. Back- 
ground music is created to be newly added. 
[024S] The text displays the text information indicated 
by the text information 2705 for the time indicated by the 
text display time information 2706. A plurality of items 
of text information may be added to one video frame. 
[0247] The time when the sound reproduction and the 
text display are started is the same as the time when the 
associated video frame is displayed. The sound repro- 
duction time and the text display time are set within the 
range of the associated video frame time. If continuous 
sound is reproduced over a plurality of video frames, the 
sound location information and the reproduction lime 
may be set to be continuous. 

[0248] With such a method, summarized sound and 
summarized text can be made possible. 
[0249] FIG. 50 shows one example of a method for 
describing tho sound information separately from the 
frame information. This is an example of a data structure 
for reproducing sound associated with the video frame 
which is displayed at the time when the special repro- 
duction is conducted. A set of the location information 
2801 showing the location of the sound to be repro- 
duced, reproduction start time 2802 when the sound re- 
production is started, and reproduction time 2B03 when 
the reproduction is continued is set as one item of sound 
information 2800 to be described as an arrangement of 
this sound information. 

[0250] FIG. 51 shows a data structure for describing 
the text information. The data structure has the same 
strrotareB^ 

of character code location information 2901 of the text 
to be displayed, a display start time 2902, and a display 
time 2903 is set as one item of text information 2900 to 
be described as an arrangement of this sound informa- 
tion. As information corresponding tothe character code 
location information 2901 , instead of the character code 
location information 2901 , the location information may 
be used which indicates a location where the character 
code is stored, or a location where the character is 
stored as a video. 

[0251] The above sound information or the text infor- 
mation is synchronized with the display of the video 
frame to be displayed as information associated with the 
video frame or a constant video frame section in which 
the-dlsplayed-video-frame-is-present^-Asshown-in^G.- 
52, the reproductlori^oTthe display "of the sound irifor-" 
mation or the text information is started with the lapse 
of time shown by the time axis 3001 . In the beginning, 
the video 3002 is displayed and reproduced for the de- 
scribed display time in an order in which the respective 
video frames are described. Reference numerals 3005, 



3006 and 3007 denote respective video frames and a 
predetermined display time is allocated thereto. The 
sound 3003 is reproduced when the reproduction start 
time described in each sound information comes, When 

5 the reproduction time described in a similar manner has 
passed away, the reproduction is suspended. As shown 
in FIG. 52, a plurality of sounds 3008 and 3009 may be 
reproduced. In a similar manner as the sound, the texl 
3004 is also displayed when the display time described 

10 in the each of the text information comes, When the dis- 
play time which is described has passed away, the dis- 
play is suspended. A plurality of texts 3010 and 3011 
may be displayed at the same time. 
[0252] it is nol required that the sound- reproduction 

'5 start time and the toxt display start time coincides wiih 
the time at which the video frame is displayed. It is not 
required that the sound reproduction time and the text 
display time coincides with the display time of the video 
frame. These limes can be freely set, on the conlrary, 

20 the display time of the video frame may be changed in 
accordance with the sound reproduction time and the 
text display time. 

[0253] It is possible that these times can be manually 
set by man. 

25 [0254] In order to omit the trouble of determination by 
man, it is preferable to determine a phenomenon which 
is likely to appear in the video scene which seems to be 
important and to automatically set these times. Herein- 
after, several examples of automatic setting are shown. 
30 [0255] FIG. 53 shows one example of a processing 
procedure in which a continuous video frame section is 
determined which is referred to as a shot from a change- 
over of the screen up to the next change-over of the 
screen, so that the total of the display time of the video 
35 frames included in the shot is defined as the sound re- 
productiontime. FiG. 53isalso established as a function 
bl6ck"~diagram. 
[0256] At step S3101 , the shot is detected from the 
video. For this purpose, there are used such methods 
40 as a method for detecting a cut of a motion picture from 
the MPEG bit streams using a tolerance ratio detection 
method. (The transactions of the institute of electronics, 
information and communication engineers, Vol. J82-D- 
II, No. 3, pp. 361-370, 1999) and the like. 
45 [0257] At step S31 02, the video frame location infor- 
mation is referred to thereby investigating which shot re- 
spective video frames belong to. Furthermore, the dis- 
play times of respective shots are calculated by taking 
the total of the display times of the video frames. 
so [0258] For example, the sound location information is 
set as the sound location corresponding to the start of 
— ^e^hct^e-sound-reproduction-startiime-rriay-be-al 
lowed to coincide with the dispi ay time of the initial video 
frame which belongs to each shot while the sound re- 
55 production time may be set to be equal to the display 
time of the shot. Otherwise, in accordance with the re- 
production time of the sound, the display time of the vid- 
eo frames included in each shot may be corrected. Al- 



. 55 
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though the shot is delected here, if a data structure is 
taken wherein the importance Information is described 
in the frame Information, the section having importance 
exceeding the threshold value is determined by using 
the importance with respect to the video 1rame so that 
the sound included in the section may be reproduced. 
[0259] If the determined reproduction time does not 
meet a constant reference, the sound may not be repro- 
duced, 

[0260] FIG, 54 shows one example of a processing 
procedure in which important words are taken out from 
sound data corresponding to the shot or the video frame 
section having the high importance with sound recogni- 
tion so that the words, or the sound including the words 
or the v sound in which a plurality of words are combined 
are reproduced. FIG. 54 is also established as a function 
block diagram, 

[0261] At step S3201 , the shot Is detected. In place of 
the shot, the video frame section having the high impor- 
tance Is calculated. 

[0262] At step S3202,the sound recognition is carried 
out with respect to the sound data section correspond- 
ing to the obtained video frame section. 
[0263] At step S3203, sounds including the important 
word portion orsounds of the important word portion are 
determined from the recognition result, in order to select 
the important words, an important word dictionary 3204 
is referred to. 

[0264] At step S3205, the sound for reproduction is 
created. Continuous sounds including the important 
words may be used as they are. Only important words 
may be extracted. Sounds having a combination of a 
plurality of important words may be created. 
[0265] At step S3206, in accordance with the repro- 
duction time of the created time, the display time of the 
video frame is corrected. However, the ^number of se- 
Tecte~d wrd^ the reproduction 



time of the sound may be shortened so that the sound 
reproduction time is set to be within the display time of 
the video frame. 

[0266] FIG. 55 shows one example of a procedure in 
which text information is obtained from the telop. FIG. 
55 is also established as a function block diagram. 
[0267] In the processing of FIG. 55, the text informa- 
tion is obtained from the telop or the sound displayed in 
the video. 

[0268] At step S3301 , the lelop displayed In the video 
is read. This includes a method in which the telop in the 
original video is automatically extracted or the telop is 
read by man to be manually input with a method or the 
like described in, for example, a method described in a 

_ jiteratprssuch^ 
portion from the video for the telop region" by Osamu 

-^oTi-CVi.M^^^^ 

[0269] A step S3302, important words are taken out 
from the telop character string which has been read. In 
the judgment of important words, an important word dic- 
tionary 3303 is used. The telop character string which 



is read may be text information as it is. Extracted words 
are arranged, and a sentence representing the video 
frame section may be constituted wilh only the important 
words to provide text information. 
5 [0270] FIG; 56 shows one example for obtaining the 
text Information from the sound. FIG. 56 is also estab- 
lished as a function block diagram. . 
[0271] In the sound recognition processing at step 
S3401 , sound is recognized. 
10 [0272] At step S3402, important words are taken out 
from the recognized sound data. In the judgment of im- 
portant words, an important word dictionary 3403 is 
used. The recognized sound data may be used as test 
information. Extracted words are arranged, and a sen- 
is tence is constituted which represents the video frame 
section with only the important words to provide text in- 
formation. 

[0273] FIG. 57 shows an example of processing pro- 
cedure Tor taking out text information and preparing the 

20 text Information with telop recognition from the shot or 
from the video frame section having high importance. 
FIG. 57 is also established as a function block diagram. 
[0274] At step S3501, the shot is detected from the 
video. Instead of the shot, the section having high im- 

2s portance may be determined. 

[0275] AtstepS3502, the telop represented in the vid- 
eo frame section is recognized. 
[0276] At step S3503, the important words are ex- 
tracted by using an important word dictionary 3504. 

30 [0277] At step S3505, text for the display is created. 
For this purpose, a telop character string including im- 
portant words may be used. Only important words or a 
character string using the Important words may be used 
as text information, If text information is obtained by 

35 sound recognition, the telop recognition processing at 
step S3502 Is subjected to sound recognition process- 

TnWinpufsoul^^ 

together with the video frame in which the text is dis- 
played as telop or video frame of the time at which the 



40 data is reproduced as sound. Otherwise, text informa- 
tion in the video frame section may be displayed at one 
time. 

[0278] FIGS. 58A and 58B are views showing a dis- 
play example of the text information. As shown in FIG. 
45 58A, the display may be divided Into the text information 
display area 3601 and the video display area 3602. As. 
shown in FIG. 5BB, the text Information may be over- 
lapped with the video display area 3603. 
[0279] Respective display times (reproduction times) 
50 of the video frame, the sound information and the text 
information may be adjusted so that all the media inf or- 

matlon is-synchronized^For-examplc f ^Uhe-time-of-the 

^Jblg'speecf reproduction of the 'video", "important 

... r . sounds-are-extracted-by-the.above.-method r .and-a-haif. 

55 time sound information of the normal reproduction is ob- 
tained. Next, the display time is allocated to the video 
frame associated with respective sounds. If the display 
time of the video frame is determined so that the scene 
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change quantity becomes constant, the sound repro- 
duction time or the text display time is set to be within 
the display time of the respectively associated video 
frames. Otherwise, a section including a plurality of vid- 
eo frames is determined like the shot, so that the sound 
or the text included in the section Is determined or dis- 
played in accordance with the display time of the sec- 
lion. 

[0280] So far there has been explained video data as 
its main focus. However, the data structure or the 
present invention can be modified to a data having no 
frame Information, i.e., the sound data. It is possible to 
use sound information and text information in the form 
without the frame information. In this case, a summary 
is created which comprises only sound information or 
iojcI mlormation with respect lo the original video data. 
In addition, a summary can be created which comprises 
only sound Information and text information with respect 
lo the sound data and music data. 
[0281 ] Though the data structures shown in FIGS. 50 
and 51 are used to describe the sound information and 
text information in synchronization with the video data, 
it is possible to summarize the sound data and text data 
only. To summarize the sound data, the data structure 
shown in FIG. 50 can be used irrespective of the video 
information. To summarize the text data, the data struc- 
ture shown in FIG. 51 can be used irrespective of the 
video information. At that time, in the same manner as 
in the case of the frame information, the original data 
information may be added to describe a correspond- 
ence relationship between the original sound and music 
data to the sound information and text information. 
[0282] FIG. 59 shows an example of a data structure 
in which the original data information 4901 is included 
in the sound information shown in FIG. 50. If the original 
data is the video, the original data information 4901 in- 
dicates^hB sBctionTif 

tion 4902 and section length information 4903). 
[0283] If the original data is sound data and music da- 
ta, the original data information 4901 indicates the sec- 
tion of sound and music. 

[0284] FIG. 60 shows an example of a data structure 
in which the original data information 4901 is included 
in the sound information shown in FIG. 30. 
[0285] FIG. 61 explains an example In which sound/ 
music is summarized by using the sound information. 
The original sound/music is divided into several sec- 
tions. A portion of the section is extracted as the sum- 
marized sound/music so that the summary of the origi- 
nal data is created. For example, a portion 5001 of the 
section 2 is extracted as summarized sound/music to 
^bej:epraduceilas.asection^0O2.ofJhasummaiy^s_^- 



information and the text information with the resull that 
a plurality of sound/music data items can be summa- 
rized togelher. At this time, if identification information 
is added to the individual original data, the original data 

s identification Infoimation may ba described in place of 
the original data file and the section. 
[0287] FIG, 62 explains an example in which sound/ 
music is summarized by using the sound Information. 
Portions of plural sound/music data items are extracted 

io as the summarized sound/music so that the summary 
of the original data is created. For example, a portion 
5001 pf the sound/music data item 2 is extracted as 
summarized sound/music to be reproduced as a section 
51 02 of the summary. A piece of music included in one 

is music album is extracted by a portion of the section, so 
that a summarized data for trial can be created as a us- 
age. 

[0288] If an album Is summarized, the title of the music 
may be included in the music information when it is pref- 
20 erable that the title of the music can be known. This in- 
formation is not indispensable. 
[0289] Next, a method of providing video data will be 
explained. 

[0290] If the special reproduction control information 
25 created in the processing of the embodiment is provided 
for the use, It is necessary to provide the special repro- 
duction control information from the side of those who 
create the information to the side of the user with some 
means. As this method of providing the special repro- 
30 duction control information, various forms can be con- 
sidered as exemplified below: 



(1) Video data and special reproduction control in- 
formation are recorded on one (or a plurality of) re- 
cording medium (or media) and provided at the 
same time; 

-\2) Videi) datais-Tecorded-orr one (or -a-pluTaiity of) 
recording medium (ormedia) and provided, andthe 
special reproduction control information is sepa- 
rately recorded on one (or a plurality of) recording 
medium (media) and provided; 

(3) Video data and the special reproduction control 
information are provided via the communication 
medium at the same occasion; 

(4) Video data andthe special reproduction control 
information are provided via the communication 
media at different occasions, 
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"example of a method for dividing" the section, the music - 
may be divided into chapters and the conversation may 
be divided by the contents. 

[0286] Furthermore, in the same manner as in the 
case of the frame information, the description of the orig- 
inal data file and the section are. included in the sound 
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[0291] According to the above described embodi- 
ments, a special reproduction control information de- 
scribing method for describing special reproduction con- 
^olJnfojmatiprLprovjde^ 

"respect tb the video" contents describes, "as the "frame "" 
information, for each of frames or groups of continuous 
or adjacent frames selectively extracted from the whole 
frame series of video data constituting the video con- 
tents, first information showing a location at which video 
data of the one frame or one group is present and sec- 
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ond information associated with display time allocated 
to the one frame or the frame group, and/or third infor- 
mation showing importance allocated to the one frame 
or the frame group corresponding to the frame infomna- 
tion. 

[0292] According to the above described embodi- 
ments, a computer readable recording medium storing 
a special reproduction control information stores at least 
frame information described for each of frames or 
groups of continuous or adjacent frames selectively ex- 
tracted from the whole frame series of video data con- 
stituting the video contents, the frame information com- 
prising first Information showing a location at which vid- 
eo data of the one frame or one group is present and 
second information associated with display time allocat- 
ed to the one frame or the frame group, and/or third in- 
formation showing importance allocated to the one 
frame or the frame group corresponding to the frame 
information. 

[0293] According to the above described embodi- 
ments, a special reproduction control information de- 
scribing apparatus/method for describing special repro- 
duction control information provided for special repro- 
duction with respect to the video contents describes, as 
the frame information, for each of frames or groups of 
continuous or adjacent frames selectively extracted 
from the whole frame series of video data constituting 
the video contents, video location information showing 
a location at which video data of the one frame or one 
group is present and display time control information in- 
eluding display time information and basic information 
based on which the display time is calculated, to be al- 
located to the one frame or the frame group. 
[0294) According to the above described embodi- 
ments, a special reproduction apparatus/method which 
_ enables a special reproduction with respect to v[deo 
contents," wherein special reproduction control informa- 
tion is referred to which includes at least frame informa- 
tion including video location information showing a lo- 
cation at which one frame data or one frame group data 
is present which information is described for each of the 
frame groups comprising one frame selectively extract- 
ed out of the whole frame series of the video data allo- 
cated to the video contents and constituting the video 
contents or a plurality of continuous or adjacent frames; 
the one frame data or the frame group data correspond- 
ing to each Trame information is obtained on the basis 
of video location information included in the frame infor- 
mation while the display time which should be allocated 
to each frame information is determined on the basis of 
display time control information included in at least each 

rality of frames which is or are obtained is reproduced 
• atthe-detenriineddisplaytimeHn-apredetermined-order 
thereby carrying out a special reproduction. 
[0295] In the above described embodiments, for ex- 
ample, image data is created in advance, which is ex- 
tracted in frame units from location information on an 



effective video frame or an original video which is used 
for display, and the video frame location information or 
information on the display time of the image data is cre- 
ated separately from the original video. Either video 
5 frames orthe image data extracted from the original vid- 
• eo is continuously displayed on the basis of the display 
information so that a special reproduction such as a dou- 
ble speed reproduction, a trick reproduction, jump con- 
tinuous reproduction orthe like is enabled. 
10 [0296] In the double speed reproduction for confirm- 
ing the contents at a high speed, display time is deter- 
mined in advance in such a mannerthatthe display time 
is extended at a location where a motion of the scene is 
large while the display time is shortened at a location 
15 where the motion is small so that the change in the dis- 
play screen becomes constant as much as possible. Al- 
ternatively, the same effect can be obtained even when 
the location information is determined so that an interval 
of the extracted locaLion is made small at a localion 
20 where a motion of the video frame or video data used 
for the display is large while the interval is made small 
at a location where the motion Is large. A reproduction 
speed control value may be created so that a double 
speed value or a reproduction time is provided which is 
25 designated by a user as a whole. A long video can be 
viewed at double speed reproduction, so that the video 
can be easily viewed in a short time, and the contents 
can be grasped in a short time. 
[0297] It is possible to reproduce videos so that 1m- 
30 portant locations are not overlooked by extending the 
display time at the important locations and shortening 
the display time at unimportant locations in accordance 
with the importance of the video. 
[0298] Only important locations may be efficiently re- 
35 produced by partially omitting a part of the video without 
displaying the whole video frame. 

TBSHSf According" to emBoaiffiems-tff ttTe" present in~- 

vention, an effective special reproduction is enabled on 
the basis of the control information on the reproduction 
40 side by arranging and describing as control information 
provided for aspecial reproduction of the video contents 
a plurality of frame information including, a method for 
obtaining a frame or a group of frames selectively ex- 
tracted from the original video, information on the dis- 
play time (absolute or relative value) allocated to the 
frame or the group of frames and Information which 
forms the basis for obtaining the information on the dis- 
play time. 

[0300] For example, each of the above functions can 
so be realized as software. The above embodiments can 
be realized as a computer readable recording medium 

on whicha prog rarms-re corded for allowing the-compu=_ 

ter to conduct predeterrnined" means or for allowing the 

computer-tofunctlonas-predetermined.means l .or.f.or.aU.. 

55 iowingthecornputerto realize a predetermined function. 
[0301] The structures shown in each of the embodi- 
ments are one example, and are not intended to exclude 
other structures. It is also possible to provide a structure 
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which is obtained by replacing a part of Ihe structure 
exemplified above with another structure, omilting a part 
of the exemplified structure, adding a different function 
to the exemplified structure, and combining such meas- 
ures. A different structure logically equivalent to the ex- 
emplified structure, a different structure including a part 
logically equivalent to the exemplified structure, and a 
different structure logically equivalent to the essential 
portion of the exemplified structure can be provided. An- 
other structure identical to of similar to the exemplified 
structure, or a different structure having the same effect 
as the exemplified structure or a similar effect can be 
provided. 

[0302] In each of the embodiments, various variations 
with respect to various structure components can be put 
into practice in an appropriate combination. 
[0303] Each of Ihe embodiments includes or inherent- 
ly contains an invention associated with various view- 
points, stages, concept or a category such as, for ex- 
ample, an invention as a method for describing informa- 
tion, an invention as information which is described, an 
invention as an apparatus or a method corresponding 
thereto, an invention as an inside of the apparatus or a 
method corresponding thereto. 

[0304] Consequently, the invention can be extracted 
without being limited to the exemplified structure from 
the content disclosed in the embodiment according to 
this invention. 



Claims 

1 ? A method of describing frame information, the 
method characterized by comprising; 

describing, for a frame extracted from a plural- 
ityof IraTneslfTa source vl deo "data'Ti rstTnfo r-~ 
mation (101) specifying a location of the ex- 
tracted frame in the source video data; and 
describing, for the extracted frame, second in- 
formation (1 02) relating to a display time of the 
extracted frame. 

2. The method accordingto claim 1 , characterized in 
that the extracted frame comprises a group of 
frames, and the first information comprises informa- 
tion specifying a location of the extracted group of 
frames in the source video data. 

3. The method according to claim 1 or 2, character- 
ized by further comprising describing, for the ex- 

— ^racted^rame,-third-mfoiTnati^^ 
""portance of the extracted frame. 

4. The method according to claim 1 , 2 or 3, charac- 
terized in that the first information comprises infor- 
mation specifying an image data file created from 
the video data of the extracted frame. 



5. The method according to claim 1 , characterized in 
that the extracted frame comprises a frame extract- 
ed from a plurality of frames Included in a temporal 
section of the source video data, and further de- 

5 scribing fourth information specifying the temporal 
section of the source video data. 

6. The method according to claim 5, characterized in 
that the first information comprises infoimation 

10 . specifying an image data file created from the 
source video data of the extracted frame, the image 
data corresponding to the extracted frame. 

7. The method according to any one of the preceding 
15 claims, characterized in that the second informa- 
tion comprises information relating to such display 
time that a frame activity value during a special re- 
production is kept substantially constant. 

20 e. The method according to any one of the preceding 
claims, characterized by further comprising de- 
scribing fifth information (123) indicating whether 
the extracted frame is reproduced or not, 

25 Q. The method according to claim 1 , characterized 
in that the first information comprises one of infor- 
mation specifying a location of the extracted frame 
among the plurality of frames and information spec- 
ifying a location of image data within an image data 

30 file created from the source video data and stored 
separately from the video data, the image data cor- 
responding to the extracted frame. 

10. The method according to any one of the preceding 
35 claims, characterized by further comprising de- 
scribing, for media data otherthan the source video 
data - iricfudirlg the" ixTractia^fra^ 

specifying a location of the media data and informa- 
tion relating to a display time of the media data. 

40 

11. An article of manufacture comprising a computer 
usable medium storing frame information, theframe 
Information characterized by comprising: 

45 first Information (101 ), described for a frame ex- 

tracted from a plurality of frames, specifying a. 
location of the extracted frame in the source 
video data; and 

second information (102), described for the ex- 
so tracted frame, relating to a display time of the 

extracted frame. 



12. The article of manufacture according to claim 11, 
characterized in that the extracted frame compris- 
es es a group of frames, and the first information com- 
prises information specifying a location of the ex- 
• tracted group of frames in the source video data. 
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13. The article of manufacture according 1o claim 11 or 
12, characterized in that the frame information 
comprises third information- (122) relating to impor- 
tance of the extracted frame, 

14. The article of manufacture according to claim 11,12 
or 13, characterized in that the first information 
comprises Information specifying an image data file 
created from the video data of the extracted frame, 

15. The article of manufacture according to claim 11, 
characterized by further comprising storing the 
source video data and an image data file corre- 
sponding to the source video data of the extracted 
frame in addition to the frame information. 

16. An apparatus for creating frame Information, the ap- 
paratus characterized by comprising: 

a unit configured to extract a frame from a plu- ?° 
rality of frames In a source video data; 
a unit configured to create the frame informa- 
tion including first information specifying a lo- 
cation of the extracted frame and second infor- 
mation relating to a display time of the extracted 25 
frame; and 

a unit configured to link the extracted frame to 
the frame information. 



15 



19. A method of performing a special reproduction 
characterized by comprising: 

referring to frame information described. for a 
frame extracted from a plurality of frames in a 
source video data and including first informa- 
tion (1 01) specifying a location of the extracted 
frame and second information (102) relating to 
a display time of the extracted frame; 
obtaining the video data corresponding to the 
extracted frame based on the first information; 
determining the display time of the extracted 
frame based on the second information; and 
displaying the obtained video data for the de- 
termined display time. 



20 



17. A method of creating frame information, the method 
characterized by comprising: 



extracting a frame from a plurality of frames in 
a source video data; and 
creating the frame information Including first in- 
formation specifying a location of the extracted 
frame in the source video data and second in- 
formation relating to a display time of the ex- 
tracted frame. 
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An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied Lherein, the computer read- 
able program code means performing a special re- 
production, the computer readable program code 
means characterized by comprising: 



computer readable program code means for 
causing a computer to refer to frame- informa- 
tion described for a frame extracted from a plu- 
rality of frames in a source video data and in- 
cluding first information (101) specifying a loca- 
tion of the extracted frame and second informa- 
tion (102) relating to a display time of the ex- 
tracted frame; 

computer readable program code means for 
causing a computer to obtain the video data 
corresponding to the extracted frame based on 
the first information; 

computer readable program code means for 
""causing" a~cbnfp^ 
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1 8. An apparatus fo r performing a special reproduction, 
characterized by comprising: 



a unit configured to refer to frame information 
described tor a frame extracted from a plurality 
of frames in a source video data and including 
first information specifying a location of the ex- 
tracted frame in the source video data and sec- 
ond information relating to a display time of the 
extracted frame; 

a unit configured to obtain the video data cor- 

first information; 
- -a-unit -conflgured-to -determine -the-display-time- 
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time of the extracted frame based on the sec- 
ond Information; and 

computer readable program code means for 
causing a computer to display the obtained vid- 
eo data for the determined display time. 

21. A method of describing sound Information, the 
method characterized by comprising: 

describing, for a frame extracted from a plural- 
ity of sound frames in a source sound data, first 
information specifying a location of the extract- 
ed frame in the source sound data; and 
describing, for the extracted frame, second in- 
-formation-relating-to-a-reproduGtion-startaime— 
"~ "and reproduction time of the sound data of the 
— extraoted-frame, : 



of the extracted frame based on the second in- 55 
formation; and 

a unit configured to display the obtained video 
data for the determined display time. 



22. An article of manufacture comprising a computer 
usable medium storing frame information, thef rarne 
information characterized by comprising: 



49 EP 1 168 840 A2 50 

first information, described for a frame extract- 
ed from a plurality of sound frames, specifying 

; a location of the extracted frame in the source 

| sound data; and 

j second information, described forthe extracted 5 

! frame, relating to a reproduction start time and 

j reproduction time of the sound data of the ex- 

tracted frame, 

j 23. A method of describing text information, the method 10 

characterized by comprising: 



describing, for a frame extracted from a plural- 
ity of text frames in a source text data, first in- 
formation specifying a location of the extracted is 
frame in the source text data; and 
describing, for the extracted frame, second in- 
formation relating to a display start time and 
display lime of the Lexl data or the exlracted 
frame. 20 

24. An article of manufacture comprising a computer 
usablemedium storingf rame information, the frame 
information characterized by comprising: 

25 

first information, described for a frame extract- 
ed from a plurality of text frames in a source 
text data, specifying a location of the extracted 
frame in the source text data; and 
second information, described forthe extracted so 
frame, relating to a display start time and dis- 
play time of the text data of the extracted frame. 

25. A carrier medium carrying computer readable in- 
structions for controlling the computer to carry out 35 
the method of any one claims 1 to 10, 17, 19, 21 
ana 1 23. " 
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